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PREFACE 


Statistical mechanics presents two fundamental problems for 
mathematics: (1) the so-called ergodic problem, that is the 
problem of a rigorous justification of replacement of time- 
averages by space (phase)-averages; (2) the problem of the 
creation of an analytic apparatus for the construction of 
asymptotic formulas. In order to become familiar with these 
two groups of problems, a mathematician usually has to over- 
come several difficulties. For understandable reasons, the books 
on physics do not pay much attention to the logical foundation 
of statistical mechanics, and a great majority of them are 
entirely unsatisfactory from a mathematical standpoint, not 
only because of a non-rigorous mathematical discussion (here a 
mathematician would usually be able to put things in order by 
himself), but mainly because of the almost complete absence of 
a precise formulation of the mathematical problems which occur 
in statistical mechanics. 

In the books on physics the formulation of the fundamental 
notions of the theory of probability as a rule is several decades 
behind the present scientific level, and the analytic apparatus of 
the theory of probability, mainly its limit theorems, which 
could be used to establish rigorously the formulas of statistical 
mechanics without any complicated special machinery, is com- 
pletely ignored. 

The present book considers as its main task to make the 
reader familiar with the mathematical treatment of statistical 
mechanics on the basis of modern concepts of the theory of 
probability and a maximum utilization of its analytic apparatus. 
The book is written, above all, for the mathematician, and its 
purpose is to introduce him to the problems of statistical 
mechanics in an atmosphere of logical precision, outside of 
which he cannot assimilate and work, and which, unfortunately, 
is lacking in the existing physical expositions. 

The only essentially new material in this book consists in the 
systematic use of limit theorems of the theory of probability for 
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rigorous proofs of asymptotic formulas without any special 
analytic apparatus. The few existing expositions which intended 
to give a rigorous proof to these formulas, were forced to use for 
this purpose special, rather cumbersome, mathematical ma- 
chinery. We hope, however, that our exposition of several 
other questions (the ergodic problem, properties of entropy, 
intramolecular correlation, etc.) can claim to be new to a certain 
extent, at least in some of its parts. 


CHAPTER I 


INTRODUCTION 


1. A brief historical sketch. After the molecular theory of 
the structure of matter attained a predominant role in physics, 
the appearance of new statistical (or probabilistic) methods of 
investigation in physical theories became unavoidable. From 
this new point of view each portion of matter (solid, liquid, or 
gaseous) was considered as a collection of a large number of 
very small particles. Very little was known about the nature 
of these particles except that their number was extremely large, 
that in a homogeneous material these particles had the same 
properties, and that these particles were in a certain kind of 
interaction. The dimensions and structure of the particles, as 
well ns the laws of the interaction could be determined only 
hypothetically. 

Under such conditions the usual mathematical methods of 
investigation of physical theories naturally remained completely 
powerless. For instance, it was impossible to expect to master 
such problems by means of the apparatus of differential equa- 
tions. Even if the structure of the particles and the laws of their 
interaction were known, their exceedingly large number would 
have presented an insurmountable obstacle to the study of 
their motions by such methods of differential equations as are 
used in mechanics. Other methods had to be introduced, for 
which the large number of interacting particles, instead of being 
an obstacle, would become a stimulus for a systematic study of 
physical bodies consisting of these particles. On the other hand, 
the new methods should be such that a lack of information 
concerning the nature of the particles, their structure, and the 
character of their interaction, would not restrict the efficiency of 
these methods. 

All these requirements are satisfied best by the methods of 
the theory of probability. This science has for its main task the 
study of group phenomena, that is, such phenomena as occur in 
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collections of a large number of objects of essentially the same 
kind. The main purpose of this investigation is the discovery 
of such general laws as are implied by the gross character of the 
phenomena and depend comparatively little on the nature of 
the individual objects. It is clear that the well-known trends of 
the theory of probability fit in the best possible way the afore- 
mentioned special demands of the molecular-physical theories. 
Thus, as a matter of principle, there was no doubt that statis- 
tical methods should become the most important mathematical 
tool in the construction of new physical theories; if there existed 
any disagreement at all, it concerned only the form and the 
domain of application of these methods. 

In the first investigations (Maxwell, Boltzmann) these ap- 
plications of statistical methods were not of a systematical char- 
acter. Fairly vague and somewhat timid probabilistic arguments 
do not pretend here to be the fundamental basis, and play ap- 
proximately the same role as purely mechanical considerations. 
Two features are characteristic of this primary period. First, far 
reaching hypotheses are made concerning the structure and the 
laws of interaction between the particles; usually the particles 
ate represented as elastic spheres, the laws of collision of which 
are used in an essential way for the construction of the theory. 
Secondly, the notions of the theory of probability do not appear 
in a precise form and are not free from a certain amount of 
confusion which often discredits the mathematical arguments 
by making them either void of any content or even definitely 
incorrect. The limit theorems of the theory of probability do 
not find any application as yet. The mathematical level of all 
these investigations is quite low, and the most important 
mathematical problems whicb are encountered in this new 
domain of application do not yet appear in a precise form.* 

It should be observed, however, that the tendency to restrict 


An excellent critical analysis of this first period is found in a well-known 
work by P. and T. Ehrenfest which appeared in vol. IV of the Encyclo- 
paedie der Mathematischen Wissenschaften and which played a consider- 
able role in the development of the mathematical foundations of statistical 
mechanics. 


the role of statistical methods by introducing purely mechanical 
considerations, (from various hypotheses concerning the laws 
of interaction of particles), is not restricted to the past. This 
tendency is clearly present in many modern investigations. 
According to a historically accepted terminology, such investi- 
gations sre considered to belong to the kinetic theory of matter, 
as distinct from the statistical mechanics which tries to reduce 
such hypotheses to a minimum by using statistical methods as 
much æ possible. Each of these two tendencies bas its own 
advantages. For instance, the kinetic theory is indispensable 
when dealing with problems concerning the motion of separate 
particles (number of collisions, problems concerning the study 
of systems of special kinds, mono-atomic ideal gas) ; the methods 
of the kinetic theory are also often preferable, because they give 
a treatment of the phenomena which is simpler mathematically 
and more detailed. But in questions concerning the theoretical 
foundation of general laws valid for a great variety of systems, 
the kinetic theory naturally becomes sometimes powerless and 
has to be replaced by a theory which makes as few special 
hypotheses as possible concerning the nature of the particles. 
In particular, it was precisely the necessity of a statistical 
foundation for the general laws of thermodynamics that pro- 
duced trends which found their expression in the construction 
of statistical mechanics. To avoid making any special hypotheses 
about the nature of the particles it became necessary in estab- 
lishing a statistical foundation to develop laws whicb had to be 
valid no matter what waa the nature of these particles (within 
quite wide limitations). 

The first systematic exposition of the foundations of statistical 
mechanics, with fairly far developed applications to thermo- 
dynamics and mome other physical theories, waa given in Gibbs’ 
well-known book.” Besides the above mentioned tendency not 
to make any hypotheses about the nature of particles the fol- 
lowing are characteristic of the exposition of Gibbs. 


*W. Gibbs, ‘Elementary principles of statistical mecbanics,’’ Yale 
University Press, 1902. 


(1) A precise introduction of the notion of probability, which 

is given here a purely mechanical definition, is lacking 

with the resulting questionable logical precision of all 
arguments of statistical character. 

The limit theorem of the theory of probability does not 

find any application (at that time they were not quite 

developed in the theory of probability itself). 

(3) The author considers his task not as one of establishing 
physical theories directly, but as one of constructing 
statistic-mechanical models which have some analogies 
in thermodynamics and some other parts of physics; 
hence he does not hesitate to introduce some very special 
hypotheses of a statistical character (canonical distribu- 
tion, ch. 25, § 25) without attempting to prove them or 
even to interpret their meaning and significance. 

(4) The mathematical level of the book is not high; although 
the arguments are clear from the logical standpoint, 
they do not pretend to any analytical rigor. 


(2 


Nr 


At the time of publication of Gibbs’ book, the fundamental 
problems raised in mathematical science in connection with the 
foundation of statistical mechanics became more or less clear. 
If we disregard some isolated small problems, we have here two 
fundamental groups of problems representing a broad, deep, 
interesting and difficult field of research in mathematics which 
is far from being exhausted even at present. The first of these 
groups is centered around the so-called ergodic problem (ch. 
III), that is, the problem of the logical foundation for the inter- 
pretation of physical quantities by averages of their corre- 
sponding functions, averages taken over the phase-space or a 
suitably selected part of it. This problem, originated by Boltz- 
mann, apparently is far from its complete solution even at the 
present time. This group of problems was neglected by the 
investigators for a long time after some unsuccessful attempts, 
based either on some inappropriate hypotheses introduced ad 
hoc, or on erroneous logical and mathematical arguments 
(which, unfortunately, have been repeated without any criti- 
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cism in later handbooks). In the book of Gibbs these problems 
naturally are not considered because of the tendency to con- 
struct models. Only recently (1931), the remarkable work of 
G. D. Birkhoff again attracted the attention of many investi- 
gators to these problems, and since then this group of problems 
has never ceased to interest mathematicians, who devote more 
and more effort to it every year. We will discuss this group of 
problems in more detail in the ch. III. 

The second group of problems is connected with the methods 
of computation of the phase-averages. In the majority of cases, 
these averages cannot be calculated precisely. The formulas 
which are derived for them in the general theory (that is, with- 
out specification of the mechanical system under discussion) arc 
complicated, not easy to survey, and us a rule, not suited for 
mathematical treatment. It is quite natural, therefore, to try 
to find simpler and more convenient approximations for these 
averages. This problem is always formulated as a problem of 
deriving asymptotic formulas which approach the precise 
formulas when the number of particles constituting the given 
system increases beyond any limit. These asymptotic formulas 
have been found long ago by a semi-heuristic method (by 
means of an unproved extrapolation, starting from some of the 
simplest examples) and were without rigorous mathematical 
justification until fairly recent years. A decided change in this 
direction was brought about by the papers of Darwin and 
Fowler about twenty years ago. Strictly speaking these authors 
were the first to give a systematic computation of the average 
values; up to that time, such a computation was in most cases 
replaced by @ more or less convincing determination of ‘‘most 
probable” values which (without rigorous justification) were 
assumed to be approximately equal to the corresponding aver- 
age values. Darwin and Fowler also created a simple, conven- 
ient, and mathematically rigorous apparatus for the computa- 
tion of asymptotic formulas. The only defect of their theory lies 
in au extreme abstruseness of the justification of their mathe- 
matical method. To a considerable extent this abstruseness was 
due to the fact that the authors did not use the limit theorems 
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of the theory of probability (sufficiently developed by that time), 
but created anew the necessary analytical apparatus. In any 
case, the course in statistical mechanics published by Fowler’ 
an the basis of this method, represents up to now the only book 
an the subject, which is on a satisfactory mathematical level.‘ 

In closing this brief sketch we should mention that the 
development of atomic mechanics during the last decades has 
changed the face of physical statistics to such a degree that, 
naturally, statistical mechanics had to extend its mathematical 
apparatus in order to include also quantum phenomena. More- 
over, from the modern point of view, we should consider 
quantized systems as a general type of which the classical 
systems are s limiting case. Fowler’s course is arranged accord- 
ing to precisely this point of view: the new method of construct- 
ing asymptotic formulas for phase-averages is established and 
developed for the quantized systems, and the formulas which 
correspond to the classical systems ara obtained from these by 
a limiting process. 

Quantum statistics also presents same new mathematical 
problems. Thus, the justification of the peculiar principles of 
statistical calculations which are the basis of the statistics of 
Bose-Einstein and Fermi-Dirac required mathematical argu- 
ments which were distinct aa a matter of principle (not only by 
their mathematical apparatus) from all those dealt with in the 
classical statistical mechanics. Nevertheless, it could be stated 
that the transition from the classical systems to the quantum 
systems did not introduce any essentially new mathematical 
difficulties. Any method of justification of the statistical me- 
chanics of the classical systems, would require for quantized 


*Fowler, “Statistical mechanics,” Cambridge, 1929. 

‘Except, however, the well known eourse in the theory of probabilities 
by v. Mises. However, the main viewpoint of v. Mises differs from the tradi- 
tional standpoint to such aa extent that the theory expounded by him 
hardly could be given the historically established name of statistical me- 
chanics; mechanical concepts aa almost completely eliminated from this 
theory. In any case, wa shall have no occasion to compare the exposition 
of v. Mises with other expositions. 
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systems an extension of the analytical apparatus only, in some 
eases introducing small difficulties of a technical character but 
not presenting new mathematical problems. In places where we 
might have to use finite sums or series, we operate with integrals, 
continuous distributions of probability might be replaced by 
the discrete ones, for which completely analogous limit theorems 
hold true. 

Precisely for these reasons in the present book we have re- 
stricted ourselves to the discussion of the classical systems, 
leaving completely out of consideration everything concerning 
quantum physics, although all the methods which we develop 
after suitable modifications could be applied without any diffi- 
culties to the quantum systems. We have chosen the classical 
systems mainly because our book is designed, in the first place, 
for a mathematical reader, who cannot always be assumed to 
have a sufficient knowledge of the foundations of quantum 
mechanics. On the other hand, we did not consider as expedient 
the inclusion in the book of a brief exposition of these founda- 
tions. Such an inclusion would have considerably increased the 
size of the book, and would not attain the desired purpose since 
quantum mechanics with its novel ideas, often contradicting 
the classical representations, could not be substantially assimi- 
lated by studying such a brief exposition. 


2. Methodological characterization. Statistical mechanics 
has for its purpose the construction of a special physical theory 
which should represent a theoretical basis for some parts of 
physics (in the first place, for thermodynamics) using as few 
special hypotheses na possible. More precisely, statistical me- 
chanics considers every kind of matter aa a certain mechanical 
system and tries to deduce the general physical (in particular, 
thermodynamical) laws governing the behavior of this matter 
from the most general properties of mechanical systems, and eo 
ipso to eliminate from the corresponding parts of physics any 
theoretically unjustified postulation of their fundamental laws. 
The basic assumptions of statistical mechanics should be then 
(1) any general laws which hold for all (or at least for very 


general classes of) mechanical systems, and (2) representations 
of any kind of matter as a mechanical system consisting of a 
very large number of components (particles). Thus the purpose 
of statistical mechanics consists in deriving special properties 
of such many-molecular systems from the general laws of 
mechanics and in showing that, with a suitable physical inter- 
pretation of the most important quantities appearing in the 
theory, these derived special properties will give precisely those 
fundamental physical (and in particular, thermodynamical) 
laws governing matter in general aid certain special kinds of 
matter in particular. The mathematical method which allows 
us to realize these aims, for the reasons explained in §1, is the 
method of the theory of probabilities. 

Let us make some further remarks concerning the above 
described purpose of statistical mechanics. 

1. The fact that statistical mechanics considers every kind of 
matter as a mechanical system and tries to derive all its proper- 
ties from the general laws of mechanics, often leads to a criti- 
cism of being a priori mechanistic. In fact, however, all re- 
proaches of such kind are based on a misunderstanding. Those 
general laws of mechanics which are used in statistical mechanics 
are necessary for any motions of material particles, no matter 
what are the forces causing such motions. It is a complete 
abstraction from the nature of these forces, that gives to sta- 
tistical mechanics its specific features and contributes to its 
deductions all the necessary flexibility. This is best illustrated 
by the obvious fact that if we modify our point of view on the 
nature of the particles of a certain kind of matter and on the 
character of their interaction, the properties of this kind of 
matter established by methods of statistical mechanics remain 
unchanged by these modifications because no special assump- 
tion waa made in the process of deduction of these properties. 

The circumstance of being governed by the general laws of 
mechanics does not lend any specific features to the systems 
studied in statistical mechanics; as it has been said already, 
these laws govern any motion of matter, whether it has any 


relation to statistical mechanics, or not. The specific character 
of the systems studied in statistical mechanics consists mainly 
in the enormous number of degrees of freedom which these 
systems possess. Methodologically, this maune that the stand- 
point of statistical mechanics is determined not by the mechan- 
ical nature, but by the particle structure of matter. It almost 
sems as if the purpose of statistical mechanics is to observe 
how far reaching are the deductions made on the basis of the 
atomic structure of matter, irrespective of the nature of these 
atoms and the laws of their interaction. 

2. Since the mechanical basis of statistical mechanics is 
restricted only by those general laws which hold for any 
systems (or at least for very general classes of systems), of 
considerable interest for ua (even before the assumption of a 
large number of components) are the results of the so-called 
general dynamics, a branch of mechanics whose purpose is the 
deduction of such laws which hold for all mechanical systems 
and van be derived from the general laws of mechanics alone. 
This theory, evidently of a considerable philosophical interest, 
is of comparatively recent origin. In the past it was usually 
assumed that the deductions which could be made from the 
general laws of mechanics were not sufficiently concrete to have 
any scientific interest. It developed later that the situation was 
different, and at present the constructions of general dynamics 
are attracting interest of more and more investigators. In par- 
ticular, all the above mentioned investigations of Birkhoff and 
of the increasing number of his disciples belong to this theory. 
It is particularly interesting to us that the methods (and 
partially, problems) of general dynamics, even before any 
assumptions are made concerning the number of degrees of 
freedom of a system under investigation, show a definitely 
expressed statistical tendency. This fact is well-known to 
anyone who has studied investigations in this field with any 
amount of attention. Thus the fundamental theorem of Birkhoff 
is formally equivalent to a certain theorem of the theory of 
probability; conversely, the theory of stationary stochastic 
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processes, which represents one of the most interesting chapters 
of the modern theory of probability, formally coincides with one 
of the parts of the general dynamics. 

The reason for this can be easily recognized. The most im- 
portant problem of general dynamics is the investigation of the 
dependence of the character of the motion of an arbitrary 
mechanical system on the initial data, or more precisely the 
determination of such characteristics of the motion which in 
one sense or another “almost do not depend” on these initial 
data. Such a quantity for a great majority of trajectories 
assumes values very near to 4 certain constant number. But the 
expression “for a great majority of trajectories” has the mean- 
ing that the set of trajectories which do not satisfy this require- 
ment is metrically negligible in some metric, that is, has for its 
measure either zaro or a very small positive number. 

In this sense many propositions of general dynamics are of a 
peculiarly typical form. They state that for most general classes 
of mechanical systems the motion is subjected to certain definite 
conditions, if not for all initial data then at least for a metrically 
great majority of them. It is known, however, that propositions 
which can be formulated in such form, in most caasa turn out 
to be equivalent to some theorems of the theory of probability. 
This theory from a formal point of view could be considered as a 
group of special problems of the theory of measure, namely 
such problems as most often deal with the establishment of a 
metrically negligible smallness of certain sets. It suffices to 
remember that the majority of propositions of the theory of 
functions of a real variable concerned with the notions of con- 
vergence “in measure”, “almost everywhere” etc., finds an 
adequate expression in the terminology of the theory of 
probability. Thus it can be stated that even general dynamics 
which represents the mechanical basis of statistical mechanics, 
is a science which is filled to a great extent with the ideas of the 
theory of probability and which successfully usea its methods 
and analogies. 

As to the statistical mechanics, it is a science whose probabil- 
istic character is noticeable in two entirely distinct and com- 
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pletely independent features: in the general dynamics as its 
mechanical basis, and in the postulate of a great number of 
degrees of freedom allowing a most fruitful application of 
methods of the theory of probability. 

3. It remains to discuss the form in which methods and results 
of the theory of probability could be utilized in determining 
asymptotic formulas which express approximately the phase 
averages of various functions in the case of a large number of 
degrees of freedom (or for systems consisting of a large number 
of particles). 

As previously mentioned, in most expositions these formulas 
are introduced without any justification. After having derived 
these formulas for eome especially simple particular case (for 
instance, for a homogeneous mono-atomic idea] gas) the authors 
usually extend them to the general case either without any 
justification, or using some arguments of heuristical character. 
Perhaps a single exception from this general rule is represented 
by the method of Fowler. Darwin and Fowler, as was already 
mentioned, develop a special and very abstruse analytical 
apparatus for a mathematical justification of the method of 
obtaining asymptotic formulas, which they have created. No- 
where do they use explicit results of the theory of probability; 
instead, they build a separate logical structure, but, as a matter 
of fact, they ara merely moving along an analytical path parallel 
to that which is used by the theory of probability in deriving 
its limit theorems. From here only one step remains in attempt- 
ing to introduce a method which we consider sa the most expedi- 
ent: instead of repeating in a complicated formulation the whole 
long analytical process which leads to limit theorems of the 
theory of probability, we attempt to find immediately the 
bridge which unifies these two groups of problems, and the 
transition formula which would reduce the entire asymptotic 
problem of the statistical mechanics to the known limit theorem 
of the theory of probabilities. This is the path we will take in 
the present book. In this mamusr we will be able to achieve 
simultaneously two ends: from the methodological point of 
view wa will make clear the role of the theory of probabilities in 
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the statistical mechanics; from the formal point of view we have 
the possibility of establishing the propositions of statistical 
mechanics on the basis of the mathematically exact laws of the 
theory of probabilities. In order to emphasize the two above 
mentioned points we will give in the subsequent text the formu- 
lation of the necessary limit theorems of the theory of probabil- 
ities without giving their proofs (the latter will be given in the 
appendix). We hope that such a method of presentation will 
be attractive to many of those readers who arn frightened by 
the complicated formalistics of the Darwin-Fowler method. 


CHAPTER II 


GEOMETRY AND KINEMATICS OF THE 
PHASE SPACE 


3. The phase space of a mechanical system. In the sta- 
tistical mechanics it is convenient to describe the state of a 
mechanical system G with s degrees of freedom, by values of 
the Hamiltonian variables q, , qa , -+° , Qj Di) Pa, *** o, De 
The equations of motion of the system then assume the 
“canonical” form 


(1) “4s dp: _ _ oH 


ð ; 
ap,’ dt a dq,’ (1 <1< 8), 
where H is the so-called Hamiltonian function of the 2s vari- 
ables qı , --: , Pe (we always shall assume it not to depend on 
time explicitly). The function H (q: , p+) is an integral of the 
system (1). Indeed, in view of this system of equations, 





dH _ $ oH dg , shall dp, 
a A ae on. di 
\aHaH aH oH 

Se ao 

2 3q, ap. 2 on, on: 


Since system (1) contains only equations of the first order, 


the values of the Hamiltonian variables q, , -+> , p. given for 
same time t = t , determine their values for any other time t 
(succeeding or preceding to). 


Imagine now a Euclidean space T of 2s dimensions, whose 
points are determined by the Descartes coordinates g, , --- , 
P. - Then to each possible state of our mechanical system G 
there will correspond u uniquely determined point of the space 
T, which we shall call the image point of the given system; 
the whole space T we agree to call the phase space of this 
system. We shall see that, for the purposes of the statistical 
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mechanics, the geometrical interpretation of the set of all 
possible states of the system by means of its space, sppears 
exceedingly fruitful and receives a basic methodological sig- 
nificance. 

Since the state of the system at any given time determines 
uniquely its state at any other time, the motion of the image 
point in the phase space which represents the changes of state 
of the given system depending on time is uniquely determined 
by its initial position. The image point describes in the phase 
space a curve which we shall call a trajectory. It follows that 
through each point of the phase space there passes ome and 
only one trajectory, and the kinematic law of motion of the 
image point along this trajectory is uniquely determined. 

If at the time ¢ the image point of the system CG is some 
point M, of the space T, and at the other time £ (succeeding 
or preceding &) some other point M, then the points M, and 
M determine each other uniquely. We nan say that the point 
M, of the phase space during the interval of time (t , ¢) goas 
over into M. During the same interval of time every other 
point of the space T goes over into a definite new position, in 
other words all this space is transformed into itself and in 
one-to-one way, since, conversely, the position of a point at 
the time t determines uniquely its position at the time & . 
Furthermore if we keep t, fixed and vary ¢ arbitrarily, we see 
that all the set of possible changes of state of the given system 
is represented ag a continuous sequence (one-parameter group) 
of transformations of its phase space into itself, which sequence 
can be considered aa a continuous motion of this space in itself. 
This representation also turns out to be very convenient for the 
purposes of the statistical mechanics. We shall call the above- 
described motion of the phase space in itself its natural motion. 
Since the displacement of any point of the phase space in its 
natural motion during an interval of time At, depends only om 
the initial position of this point and the length of this interval, 
but does not depend on the choice of the initial time, the natural 
motion of the phase space is stationary. This meana that the 
velocities of points of the phase space in this motion depend 
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uniquely on the position of these points, but do not change with 
the time. 

In what follows, we shall often call the Hamiltonian variables 
qı , *** , P. Of the given system G the dynamic coordinates of 
its image point in the space T, and any function of these 
variables the phase function of the given system. The most 
important phase function is the Hamiltonian function 
H(qı , +-+- , pa). This function determines completely the me- 
chanical nature of the given system, because it determines 
completely the system of equations of motion. In particular, 
this function determines completely the natural motion of the 
phase space of the given system. 

When convenient we shall denote the set of the dynamic 
coordinates of the given mechanical system (the point of the 
phase space) by a single letter P, and, correspondingly, an 
arbitrary phase function by f(P). 

There ara cases where the phase space T has u part I” with 
the property that an arbitrary point of this part remains in it 
during all the natural motion of the space T. Such a part I” 
participates in the natural motion by transforming into itself, 
and therefore it is called an invariant part of the space T. In 
what follows we shall see that the motion of an invariant part 
plays a very essential role in the statistical mechanics. 

The special form of the Hamiltonian system (1) has as a 
consequence the fact, easy to foresee, that not every continuous 
transformation of the phase space into itself can appear aa its 
natural motion. A natural motion is characterized by same 
special properties, and the most important of these properties 
can be formulated in two theorems on which, to a considerable 
extent, is based the whole construction of the statistical me- 
chanics. We shall pass now to a proof of these theorems. 


4. Theorem of Liouville. The first of these two theorems 
(under slightly more restricted assumptions) was proved by the 
French mathematician Liouville in the middle of the past 


century. 
Let M be any measurable (in the sense of Lebesgue) set of 
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points of the phase space T of the given mechanical system. In 
the natural motion of this space the set M goes over into an- 
other set M, during an interval of time ¢. The theorem of 
Liouville asserts that the measure of the set M, for any ¢ 
coincides with the measure of the set M. In other words, the 
measure of measurable point-sets is an invariant of the natural 
motion of the space T. 

For the proof of this theorem it will be convenient to intro- 
duce & more uniform notation for the dynamic coordinates of 
points of the space T. Let 


Z= 4; Lass = Pi (i = 1,2, ---, 8) 


and 


oH : 
Te op,’ Xi — T Ogi (i = 1, 2, eee , 8). 


In this notation the canonical system (1) of the §3 can be 
written in the form 








(2) Os me Xo, te, y (1S 29. 
For what follows let us observe that 
2e ax, a a 
3 Mi = 
@) 2s ar, > aq. - 23 - 
If z” (i = 1, 2, --- , 2s) are the values of the variable, z; at 


rome definite time ¢ , we obtain aa a uniquely determined 
solution of the system (2), the system of functions 


s= fd, e, (1 Si < 29. 
Let ua agree to denote the measure of the set A by MA. Then 


MM, = dz, -++ dza, . 


Mı 
In this integral let us change the variables by setting 
T; = Silt; Yr, °°", Yas); 
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where i is considered as an auxiliary parameter. Since the point 
(Yı , Va.) Of the space T obviously describes the set M 
when the point (z, , --- , Z2.) describes the set M, , 


= f Kun, —— , Yas) dyi +>- dyas ,' 
where 
J= Iy, sa) = 5 * y 


If we differentiate this with respect to t we find 
(4) — = — dy, --- dy, . 
M 


We can compute ðJ/ðt by the rule of differentiation of de- 
terminants. We find 


ðJ * 
where 
GEA i » Tint, dz, /dt, Zier, p Za) 1<i< 2s 
Oy.» © y Yoo) ( ae ). 


In view of the system (2), since dx,/dt coincides with dæ./di, 


(Ti , tt p Beg, Xi, Tini, ttt o, Lae) (1 <i <2). 


“a, »Y¥2,°°° — 
But 


ox, 2° OX; Ox i 
Sai —— Ser <i< <k< 
yi 2 aa. By, (1<i<2s,1<k < 2s), 


1In this book we shall, in most cases, denote multiple integrals using only 
one integral sign; the dimensionality of the integral will be determined 
either by the number of differentials under the integral sign, or by sors 
other obvious considerations. In cases where the domain of integration ia 
not indicated explicitly, the integration will be taken over the whole space. 
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hence the previous equality gives 





— OX; O(a, pp Vi~t_y Dr y Visi ""* „ Za») 
i y=} Oz, ay: nery Yz.) 
(1 <i <29. 
But clearly 
if =í 
lTi, °° ° » Zir; Tr, Bear, °°" Za) = è, i ý 
Ila y +3 a Yaa) 0, if rt 
whence 
ox, r 
J; = J — < . 
I; az, (1 < t < 2s) 
On substituting into (5) and using (3) we have 
ðJ a ðX, 
at 7 J ta] Oz; Ebi 
Then (4) shows 
aM, _ 
T 


which proves the invariance of the measure in the natural 
motion of the phase space. 

Corollary. In the natural motion of the phase space every 
point P, during an interval of time #, goes over into a uniquely 
determined other point which we always shall denote by P, . 
If {(P) is an arbitrary phase function, we shall write 


SP.) = FP, 9 


here i might be also negative. Now, let M be a Lebesgue 
measurable set of points of the space I, of finite measure, and 
S(P) a phase function, Lebesgue integrable over T. By the 
Liouville theorem the volume element of the space T during 
the time { goes over into an equal volume element dV, . Let 
us consider the integral 
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J. fP) av, 


and let us change the variables by introducing aa new variables 
the dynamic coordinates of the point which goes over into P 
during the time ¢. It is clear that: (1) the new domain of inte- 
gration will be the set M; (2) under the sign of integral the 
symbolic argument P should be replaced by P, ; (3) the element 
dV, should be replaced by the equal element dV. Thus we get 


[ smav, =f seoav = f se, nav. 


In the left hand member we also can write dV instead of 
dV, , so that 


(6) a f(P) dV = J HP, d dy. 


In particular, if the set M is invariant, then, for each £, 
(7) J seav = f se, ù av. 
a d 


5. Theorem of Birkhoff. The second theorem, to the proof 
of which we have to turn now, was proved comparatively 
recently (in 1931) by G. D. Birkhoff (the form of the proof 
which we are giving here is due to A. N. Kolmogoroff). 

Let V be an invariant part of the phase space of a finite 
volume, f(P) a phase function summable over V and deter- 
mined at all points’ P E V. 

The theorem of Birkhoff asserts that the limit 


ır 
lim g [ IP, t) dt 


exists for all points P of the set V, except at most of a certain 
set of measure zero (or, more concisely, almost everywhere 
on V). The limit also exists almost everywhere when C —> — = . 


*The notation a € A means that a is an element of the set A. 
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It is clear that we can interpret the quantity (1/C) fs f(P, di 
as the average of the function f(P) along the trajectory passing 
through P during the interval of time (0, C). The limit of this 
expression if C —œ we shall call the time average of the 
function f(P) along the trajectory passing through P. The 
theorem of Birkhoff asserts that, for a summable function, the 
time averages exist along the trajectories passing through 
almost all points of V. 

We now pass to the proof. In what follows we set for any 
integer n 


J AP, 9 at = 2, = 2), 


n+l 
f 1AP, D [dt = y = wl). 
Lemma 1. Almost everywhere on V for n >œ 
* y(P) > 0. 


The proof of this lemma is based on the Liouville theorem. 
If we introduce a new variable a in the integral defining y, , 
determined by t = n + a, we get 


uP) = | 1AP, n+ a) |da 
8) 
= f IIP», a) | da = yP.) 
since obviously f(P, n + a) = f(P,, a). 


Let ua denote by E,,,, and En, the sets of points P belonging 
to V and satisfying respectively the conditions 


YP) > en and yo(P) > en, 


where a is an arbitrary fixed positive number. It is easy to see 
that in the natural motion of the space T the set En „ during 
the time n goes over into En, . Indeed, in view of (8) the 
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inequality y,(P) > en is equivalent to y(P,) > en. Hence 
P € E,,, implies P, E Eno , and conversely. Therefore, by 
Liouville’s theorem, 


(9) ME wn = ME,.o s 
Now we show that the series 


nel 


converges. In view of (9) this series can be written aa 


n=l 
If we denote by F,, the set of points of V for which me < 
yo(P) < (m + 1)e and observe that Eno = > a-n Fna we can 
write this series in the form 
2 2 MF a = 2 L MF. = >> mF. 
ee] mag n=] pal mol 


-lS mm1 S 


E nal @ n=l | 


yo(P) dV 
<t f uP ay =t f av | IIP, o |da 


=} f da f 1 5P, a 10v 


where dV is the element of volume of the space T. 
Since V is on invariant part of the space T, the last expres- 
sion, in view of (7), is 


1 f da f ise av =? f 1s lav. 


Since f(P) by assumption is summable over V, this is a finite 
number, which proves our assertion. 

By a known theorem of the metric theory of sets it follows 
that every point of the set V, except at most a set of measure 
zero, belongs to no more than a finite number of sets of the 
sequence E... (n = 1, 2, ---). In other words, for almost all 
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points P € V there exists a number N = N(P) such that 
for each n > N, 
y(P) < en. 


Since «e is arbitrary, Lemma 1 is proved. 
Let now for any a < b, 


ha(P) = — J " AP, t) dt. 


By the definition of z, , if a and b ara integers, we have 





has(P) = L (Za + Tarı + — + Ts). 


Lemma 2. If ho (P), ag n © assuming integer values, has 
no limit an a set M of positive measure, then there exist two 
real numbers a and 8 (a < fp) and a part M* of the set M, 
such that MM* > 0 and at each point P E€ M*, 


UP) = lim inf ho,(P) < a, 
L(P) = lim sup hy,(P) > B. 


This proposition which is almost self-evident, oan be easily 
proved. Let us consider the set of all intervals 5, (a, , Ba) 
with rational end points (the order of numeration is imma- 
terial). If P E M then l(P) < L(P)*, and therefore among 


Except for the casas where hoa(P) >+ © or Roa(P) —>— œ. It ia easy 
to wee however that for a summable function this eau occur only on a set of 
measure zero. Indeed, if we had, say, Roa(P) —> œ on a set of positive meas- 
ure, then, by a known theorem of Egoroff we could assert the uniformity 
of this process on a certain other set N, MN > 0. Let A > 0 be arbitrarily 
large and let for n > no = (A), hm(P) > A on N. Assuming n > mo and 
integrating over N, in view of (6) we have 


AMN < hale) dV =È f da f HP, a) av 


=i f daf smav< f 154P) lav 


which leads to a contradiction since A is arbitrarily large. 


the intervals ô, there will be found the first one, say ôm , for 
which 
UP) < an < Ba < L(P). 


Denote by M,, the set of all points P of M which are connected 
in this eeuse with the interval ô» . It is clear that 


M=} M. 
mal 
and that the sets M; and M, for 1 =Æ k have no points in com- 
man. Since MM > 0, we will have MM „n > 0 for at least one 
value of m. On setting a = a» , 8 = Ba , Mm = M* we see 
that Lemma 2 is proved. 

Assume now that the conditions of Lemma 2 are satisfied. 
Let P € M* and consider an interval (a, b) where a < b ara 
integers. We shall call this segment a proper segment of the 
point P if the following conditions are satisfied: 


(1) ha(P ) > B, 


(2) ha (P) < B for a<b <b. 


We shall show that two proper segments (a, , b,) and (a, , ba) 
of the aame point P cannot partially overlap each other. 
Indeed, if wa had for instance a, < a, < b, < b,, then we 
would have 


(bi — Gi)Ras, = (G2 — Gi)here, + (bi — aa)Aar, 
while, by (2), 
hae. < P, haw. S B, hav. > B 
which would lead to the contradictory relation 
B(b, — a1) < Bha: — a) + B(b, — a2) = Bld, — a). 


Furthermore, let us agree to call a proper segment of P a 
maximal proper segment of rank s, if its length does not 
exceed s, and if it is not contained in any other proper segment 
of P whose length does not exceed s. It is easy to see that 


every proper segment of length not exceeding s is contained 
in one and only one maximal proper segment of rank s. Indeed, 
among all the proper segments of length not exceeding a and 
containing the given segment, there will be onc of maximal 
length. It is clear that this will be a maximal proper segment 
of rank s. Its uniqueness follows from the fact that if there 
existed two such segments, then they would have points in 
common (as containing the given segment). But then either 
one of them would be contained in the other, and therefore 
would not be a maximal segment of rank s, or they would 
partially overlap, which has just been proved to be impossible. 

For every positive integer a let us denote by M, the set of 
points P of the set M* for which the inequality 


hoa(P ) > A 


holds for at least onan < s. It is obvious that every P E M* 
belongs to all M, when a is sufficiently large, ao that 


Meo XM. 


eel 


and, since M, C M.. , 
MM* = lim MM.. 


But MM* > 0, hence, for a sufficiently large, also MM, > 0. 
In what follows we shall denote by a some fixed positive integer 
satisfying this condition. 

Lemma 3. In order that P would belong to the set M, it is 

necessary and sufficient that P would have a maximal proper 
segment (a, b) of rank s, such that a < 0 < b. 
Proof. 1) Let P € M, and let n be the smallest positive integer 
for which h.,(P) > 8, so that n < s. Then clearly, the seg- 
ment (0, n) is a proper segment of P. As it has been proved 
above, this segment is contained in a unique maximal proper 
segment of rank s, which satisfies all conditions of Lemma 3. 

2) Let P have a maximal proper segment (a, b) of rank s, 
such that a < 0 < b. To complete the proof of Lemma 3 it is 


sufficient to show that in this case ho,(P) > p, because b < 
b-ac<s. 

If a = 0, our statement is obvious because (a, b) is a proper 
segment of the point P, whence h,,(P) > 8. If a < 0, 


(b — a)ha (P) = (ta + +: + 21) + (tot + 2-1), 
or 
(b — a)ha (P) = —aha(P) + bhos (P), 


whence 


hol P) = 5 [O — a)has(P) + aha(P)). 


But, by definition of a proper segment, 
ha (P) > B, heo(P) < B, 
and since b — a > O anda < 0, 


ho(P) > § [(b — a) + af] = 6. 


Consider now any point P of the set M, and a maximal 
proper segment (a, b) of rank s corresponding to P in the 
sense of Lemma 3, so thata <0 <b,b-—a<s.Letb—a=4q, 
—a = p, then 1 < ¢< 38,0 < p < gq — 1. In what follows 
we denote by 6,, the segment (—p, —p + gq) and by M,, the 
set of all points of M, which correspond to the segment ô, 
in the sense of Lemma 3, so that 


It is easy to see that in the natural motion of the space T the 
set M,, after p units of time goes over into M,, (because 
hoe(P) = Regal P5): Thus 

MM,, = MMe, O<p<gq- 1). 
It is also clear that the sets M,, with different pairs of indices 
cannot have points in common. Finally, in view of formula (6), 


the same relationship between the sets M,, and M,, shows that 
for any summable function ¢(P), 


f, Pav = f, oP, p ay. 


From all that has been said above we conclude 
| aP dv = È > [ =P) av 
M., q=] p0 VMpe 


SS 


Si. 2(P,) dV 


2 p=0 


> 1 


— B Zhu. av f f(P, , t) dt 


> > 


z Sf avf f(P, a) da 


2 p=0 


= 2l. av f J(P, a) da 


aai 
a 


= Í, qho,(P) dV > B DD GNM 04 


e a-l 


=6 >) 2 MM,, = SMM, 


e-1 p=0 


This relation holds for all a sufficiently large; on allowing 
s—o@ we get; 


(10) J 2P) av > emm”. 


Since for all points of M* we also have 
lim inf kon(P) < a, 


nw 


we can prove in the same way that 


(11) f _z(P) dV < aMM*. 


27 


Since a < £, the inequalities (10) and (11) are contradictory, 
which shows that our assumption MM* > 0 is not possible, 
or in other words, that the limit 

lim hon(P) 
must exist almost everywhere. 

To accomplish the proof of Virkoft’s theorem it remains to 
remove the restriction that the parameter n assumes only 
integral values. This is easily done by means of Lemma 1. 
Indeed, since the expression 


1b) 
o, IP, 9a 


(where [b] is the largest integer contained in b), as b — œ 
differs from (1/[b]) fo”! f(P, i) dt only by & factor which tends 
to 1, and since the latter expression has a limit almost every- 
where, the limit 


lim ; J l AP, d dt 


bmo 


also exists almost everywhere. On the other hand 








ò 1d] 
` f f(P, ù dt — ; I FCP, 0) dt 
6 {1 +1 
<f IP olusa, ise. ola 


by Lemma 1. Hence the limit 


lim + J NP, t) dt 


bow 


also exists almost everywhere, which completes the proof of 
the theorem of Birkhoff. 


6. Case of metric indecomposability. We shall call the 
quantity 


fp) = lim A J AP, d dt 


the “time average” of the function f(P) along the trajectory 
passing through P. Such a terminology, strictly speaking, 
becomes suitable only after we know that this quantity does 
not depend om the choice of the initial point on the given 
trajectory, in other words, that for all P and ¢ 


JP.) = f(P) 


(assuming of course that f(P) exists). We shall prove this 


property. 
Let, for definiteness, t > 0. Since, by assumption, the limit 


C+e 


lim giy), IP, o) da = HP) 


exists, and since the difference 
1 Cre 1 C+e 
g IP à da- gyz) P,a) da 


Crt 


-57yil f(P, a) da +0 


we also have 


C+re 
(12) lmg SP, o) da = fP). 
But 
L [O IP., a) da = hf AP, t+ a) da 


1*0 


-G[ 1P àda=5 | IP, a da- È fH, o) da. 


In the right-hand member the first term tends to f(P), by (12), 
and the second term tends to 0 as C — œ. Hence 


Pee ee oa 
lim & | SP., a) da = KP). 
By definition this limit is f(P,), which proves our assertion. 
We turn now to the discussion of the most important special 
case of Birkhoff’s theorem. Let V be some invariant part (of 
finite volume) of the space T. We shall call this part metrically 
indecomposable if it cannot be represented in the form 


V=Vi+V; 


where V, and V, are invariant parts of positive measure. In 
order to understand clearly the content of this notion, observe 
that the set V, as any invariant set, is a certain set of complete 
trajectories. If by any method we separate this set of tra- 
jectories into two other sets (each consisting again of complete 
trajectories), then, if V is metrically indecomposable, only one 
of the following two cases is possible: either ono of the com- 
ponent parts has measure sero (hence the other has measure 
MV), or both components are not measurable. In the case the 
set V is metrically indecomposable, Birkhoff’s theorem can be 
made considerably more precise. 

Theorem: If the set V is metrically indecomposable, then 
almost everywhere on V 


IP) = y J, 1 av. 


The quantity in the right-hand member of this equality 
could be interpreted as am average of the function f on the 
set V. We shall call it the phase average of the function f(on 
the set V), and denote it by f. Thus the above stated theorem 
asserts that, in the case of the metric indecomposability of the 
set V, the time average f(P) of any summable function f, for 
almost all initial points P is the same, and coincides with the 
phase average f of the same function. 

In order to prove our theorem, let us prove first that the 
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f(P) is constant almost everywhere on V. Otherwise, there 
would exist such a real number a that in splitting V into two 
parts V, and V, which are defined respectively by the condi- 
tions {(P) > a on V, and f(P) < aon V: , wa would have 
MV, > 0 and MV, > 0.* But, by what was proved at the 
beginning of this paragraph, the sets V, and V, are invariant, 
which implies a contradiction to the set V being metrically 
indecomposable. Thus /(P) almost everywhere on V has the 
same constant value which we denote by a. It remains to prove 
that a = f. 


feP)= Gf IP, Oat 


= zy J [a — fe(P)] dV + aT J fe(P) dv. 


By the invariance of the set V (see (7)), 


mr J, fe?) av = agy [ af se, ) dV 
- our l. “at f Pav = nr | 1P av =. 


‘In order to prove this, for any positive integer n, let us subdivide the 
axis of reals into segments (k/2", (k + 1)/%)(— œ< k <œ), and let us 
call such a segment an essential segment, if the set of points P E€ V for 
which values of f(P) belong to this segment, has m positive measure. If for 
some value of n there exist two essential segments, our assertion ts proved. 
If, however, for every m there exists only gun essential segment 3 , then 
clearly 8,41 < ôn, an that the sequence of segments (n = 1, 2, ---) has a 
single point in common a It is quite obvious that, in this case, {(P) = a 
almost everywhere on V. 
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Hence 


a = ap | lo foP lav +F 
and the quantity 

my | fa - f(Plav = a-F 
does not depend on C. Our theorem will be proved if we show 
that this integral is equal to zero. 


Let «e > 0 be arbitrarily small. Let V,(C) be the set of those 
points P € V for which 


| c= fe(P) | <e 
while 
V(O = V - V0. 
It is clear that 
| f ta- fepna] < f 1a- tP) lav 
v Vil) 
(13) +f la- fP) iav 
ValC) 


< MV + |a MVC + ie | fo(P) | av. 


Since fc(P) tends to a almost everywhere on V as C >œ, 
MV,(C) — 0 as C —œ (convergence in measure). Thus, for 
C sufficiently large, we have MV,(C) < e. But 


1 Cc 
= | fe(P) | dV < at dt L | (P, ) | aV 
(14) 
1 c 
Gh Hf, (SP lav 
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where V,(C, t) is the set into which V,(C) goes over during 
the interval of time #, in the natural motion of the phase space. 
By the theorem of Liouville, for all 2, 


MVC, t) = MV2(C), 


and MV,(C, ) — 0 as C — œ, uniformly with respect to t. In 
view of the absolute continuity of integrals of summable 
functions, we can take C go large that, for all £, 


fren PI <- 


Then (14) also shows that, in this case, 


[fe lav <e 


and then (13) gives 
| [to - secpn av | <MVtlalete. 


Thus the left-hand member of this inequality will be as small 
aa we like if C is sufficiently large, and since it does not depend 
on C, it must be equal to zero, which had to be proved. 


7. Structure functions. From point of view of physics, the 
most important phase function of the given mechanical system 
is its total energy 


E = Eq, t, q.; Diy *** Pe). 


For an isolated system this function has a constant value, in 
other words, represents an integral of the system of equations 
of motion. Therefore, for any constant a the region of the 
phase space for the point of which E = a, is an invariant part of 
the phase space. For simplicity, we shall call such regions sur- 
faces of constant energy. We shall consider only such cases 
where the function E has a lower bound over the whole space T 
(this is the case for the most interesting physical systems). 
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Using the arbitrariness of choice of the addition constant in the 
expression of the potential energy (which enters as a term in 
the expression of the function E), we may assume that a 
lower bound of £ is zero, so that E > 0 on tke whole space T. 
Furthermore we always shall assume that the portion of the 
phase space characterized by the inequality E < z, for each 
z > 0 is a simply connected domain bounded by the surface 
E = z. This surface we shall sasume to be closed and sufficiently 
smooth to justify analytical methods which will be applied in 
various problems. We shall denote by 2, the surface of constant 
energy E = z. For z, < z, the surface 2,, is situated entirely 
inside Z,,, 30 that, schematically, the family of surfaces of 
constant energy could be represented an a family of concentric 
hyper spheres. In the natural motion of the phase space each 
surface of constant energy, and also each domain bounded by 
two such surfaces, is transformed (into itself), in other words, is 
an invariant part of the phase space. 

All assumptions made above are satisfied by the systems 
which are usually considered in the physical applications of the 
statistical mechanics. Moreover, the total energy of such sys- 
tems coincides with the Hamiltonian function. It follows first 
that if E is given os a function of the dynamic variables, the 
mechanical nature of the given system is completely deter- 
mined. Secondly, we can then use the argument which we used 
in §3 to prove that the Hamiltonian function is an integral of 
the system of equations of motion, ne a proof of the law of 
conservation of energy for the systems we are going to consider. 

Let us denote by V(x) the volume of the part V, of the 
space T, in which Æ < z (that is, of the domain inside the 
surface 2,). V(x) is a monotone function which increases from 
0 to @ as x varies between the same limits. If z, < z, the 
volume of the layer enclosed between the surfaces 2,, and 
Z.. , is equal to V(z,) — V(z,). 

In what follows we shall use the following theorem. 

Theorem: Let f(P) be a point function in the space T, Bum- 
mable over a certain domain contained inside the domain F, . 
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Then 








a IP) av ae 


(Here dV and dZ denote — the volume elements of 
the space T and of the surface 2, , and 


grad E = { È | (22) : wy 


is the gradient of £.) 

For the proof observe that the element of the volume dV of 
the domain V, in the left-hand member oan be replaced by 
the product dn dZ, where dz is the volume element of that 
surface of constant energy to which abuts the element dV, and 
dn the element of the outward normal to this surface (the 
thickness of the layer separating this surface from another sur- 
face immediately adjoining). Such a change signifies merely o 
special choice of the subdivision of the domain V, , which we 
use to construct the integral, the choice which is characterised 
by the fact that initially the domain is subdivided in infinitely 
thin layers by a net of surfaces of constant energy. Thus 


f smav= [se azan. 
Va Ve 
To simplify, let us denote the dynamic coordinates of the point 
P by 2; , 22, *** , Zo, , a8 it was done in §4. Since 
dz; = dn cos (n, 2) (1 < i < 2s), 


the increment of the energy when we pass from some surface 
of constant energy to an infinitely nesr surface will be given 
by the formula 


dE = D ae, = dn $ E oos (n, 2). 


f=} ð A Ô: 
But it is known that 
OE Oz; 5 
cos (n, T) grad E (1 < % < 2s) 
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whence 


and we get 
[ wmav- | mes 


The value of the inner integral here is a function of E. On 
denoting this function by ¥(Z) we get 


dz a = 








d= 
grad E’ 





i J, 1P) av = $ f WB dE = yo) = 


as was to be proved. 

Since, by the law of conservation of energy, each surface 
Z, of constant energy of the space T, is an invariant part of 
the space I, in the natural motion of this space, every maasur- 
able set M situated on this surface, during any interval of 
time goes over into another measurable set situated on the 
same surface. However, if we define the measure of the set M 


by 
(15) MM = f dz 
M 


this measure, in general, would not remain invariant. The set 
M moving on the surface Z, in the natural motion of the phase 
space, would at the same time change its measure. Such a 
situation would have been extremely inconvenient for our 
theory, since, in discussing motion on the surface 2, we would 
have been deprived of such valuable tools ss the theorem of 
Liouville, Birkhoff and their corollaries. That is why in the 
statistical mechanics the definition (15) of measure on the 
surface 2, is always replaced by another definition which is 
invariant with respect to the natural motion of the space T. 
After such a replacement we can consider each surface of con- 
stant energy an a bounded region, invariant to the natural 
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motion of which all the results obtained in preceding paragraphs 
can be applied. In the construction of our theory which follows 
we make precisely this choice. 

In order to obtain an invariant definition of measure on the 
given surface 2, of constant energy HE = zx, consider any 
measurable (in the sense of (15)) set M on it. At each point 
of this set draw the outward normal to the surface Z, to its 
intersection with the infinitely near surface 2.4.4. . The part 
of the space T which is filled by these normals is bounded and 
will be denoted by D. The volume 


[av 


of this part is clearly invariant with respect to the natural 
motion of the phase space and can be represented also in the 
form 


— iP) aM 


where f(P) has value 1 or 0 according aa P does, or does not, 
belong to D. The ratio of this volume to Az and also the limit 
of this ratio as Ar — 0 are also invariants of the natural motion 
of the space T. But by the theorem which just has been proved 
this limit is 

d= 
Í. FP) zad E asi ET Jy grad BE 


Thus if we define 


RM = = — E 
we will have aa invariant definition of measure on the surface 
Z, . This definition of measure we shall use in all that follows 
(it is obvious that it satisfies all conditions which a definition 
of measure has to satisfy). 

In particular, the measure (volume) Q(z) of the whole surface 
x. will be 
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az 
16 =| 
(16) Q(z) — 
If we put f(P) = 1 in the general theorem proved above, we 
obtain 


(17) Q(z) = V'(z). 


Thus, the measure of the whole surface of constant energy, with 
our definition of measure, is simply equal to the derivative with 
respect to z of the volume of the domain V, of phase space 
bounded by this surface. This fact considerably simplifies the 
geometry of the structure of the phase space, in which we are 
interested now. 

According to our definition of measure we shall interpret the 
expression 


i= ae | FP. ) 


aa the average of any summable function f(P) defined an 2, . 
This is the limit, as Az — 0, of the average of f(P) on the layer 
enclosed between the surfaces Z, and 2..4. . By the theorem 
proved above this average can be also represented in the form 
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This formula in many sases turns out very convenient for 
evaluation of averages of phase functions on surfaces of con- 
stant energy. 

The function Q(z) defined by (16) is a monotone function 
increasing from 0 to œ* as z varies between the same limits. 
As we shall see later, this function completely determines the 
most important features of the mechanical structure of the 
corresponding physical system. In what follows we shall call 
this function the structure function of the given system. 
Therefore, the structure function of the given system oan be 


*Footnote of the translator. This appears as an additional assumption. 


defined either as the measure of the surface of constant energy 
(with our special definition of measure) or os the derivative 
with respect to z of the function V(z) defined above. 


8. Components of mechanical systems. In this paragraph, 
as we have done several times before, we denote by 2, , 22 , 

- , Za, the dynamical coordinates of a point of the space I, 
where the order of numeration is irrelevant. Each phase func- 
tion and, in particular, the total energy Æ of the given system, 
is a function of these 2s variables. 

Suppose that the function 


E = E(x, —— , Za) 


can be represented aa a sum of two terms E, and E, where the 
first term depends on some (not all) of the dynamical co- 
ordinates and the second term depends an the remaining co- 
ordinates. Since the order of numeration of the dynamical coor- 
dinates is irrelevant we may write E = E, + E, where 


E, = Ey (2, »%,°"° , Xe), 
E, = Eg(2es1 Desay °°" 5 Tas). 


In such a case we agree to say that the set (£, , Z2, <°- , Lae) 
of the dynamical coordinates of the given system is decomposed 
in two components (zı, --- , 2) and (2:41, *** , Za). We 
could express it also by saying that the given system consists 
of two “components” which appear as bearers of the corre- 
sponding sets of the dynamic coordinates. From the point of 
view of the formal theory it does not make any difference 
whether wa call a component of the given system the set of 
coordinates (z, , -*- , 2) itself, or if wa attribute to this set 
a certain “bearer” to which we shall give the name of a com- 
ponent. We shall use both terminologies without danger of any 
confusion. From a more realistic point of view it appears natural 
to try to interpret each component aa a separate physical system 
which is contained in the given system. However, such a 
viewpoint will be too narrow, and in some asses will not suit 
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our purposes. The point is that, although each materially 
isolated part of our system determines in most cases a certain 
component of this system, it is useful to consider occasionally 
such components (that is sets of coordinates) to which there 
does not correspond any materially isolated part of the system. 
The isolated character of such components is of a purely energy 
nature, the precise sense being given by the above definition of 
a component. For instance, if the system consists of one material 
particle the components of the velocity of which and the mass 
are respectively u, v, w, m, and if its energy E reduces to the 
kinetic energy 


E = Z W +o + u’), 


we could consider the quantity u s3 a component of our sys- 
tem, and formally attribute to it a certain “bearer” whose 
energy is (mu’*)/2, although in this case there is no question 
of any material bearer (we shall see later that such considera- 
tions can prove to be useful). 

In any case, if to each component of the given system wa 
may attribute a definite energy, (from the definition of a 
component), then each component, being essentially a group 
of dynamic coordinates, has its own phase space, and the state 
of this component (that is the set of values of its coordinates) 
is represented by a point of this phase space. The phase space 
T of the given system is clearly the direct product of the phase 
spaces T, and T, of its two components, and the volume ele- 
ment of the space can be taken to be equal to the product 
dV, dV, of the volume elements of the spaces T, and T; . 

Furthermore, each component has its own structure function. 
The law of composition of structure functions, that is the 
formula which determines the structure function of the given 
system in terms of structure functions of its components, is 
one of the most important basic formulas of our theory. We 
now pass on to the derivation of this formula. 

First we make the following observation: If we have a phase 
function of the given system whose value is completely de- 
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termined by the value of the energy of the system at the 
corresponding point of the space T, then the integral of such 
a function f(E), taken over the domain of the space I en- 
closed between two surfaces of constant energy È., and 2,, , 
can be easily expressed in the form of a simple integral, namely, 


(18) J „1E av = | ” DAA) dz. 


Indeed, we may evaluate our multiple integral by subdivid- 
ing the domain z, < E < z: of the space into infinitely thin 
layers between surfaces of constant energy. In the layer be- 
tween the two surfaces 2, and 2,.,, the function f(E) (which 
for simplicity is supposed to be continuous) can (with an in- 
finitesimal error) be assumed to be equal to f(z), while the 
volume of this layer, up to infinitesimals of higher order, is 


V'(z) Ax = Q(z) Az 
which gives formula (18). In particular, 


(19) J imav = f raa az. 


This formula is used in a great number of applications. 

Now let V(x) and Q(z) be the functions aa defined above for 
the given system, while V,(z), Q(z) and V.(z), Q(z) the 
corresponding functions for the two components of the given 
system. Then 


E) E is od = — aV, a avs 


= f V.(e — E) dV, , 
(Ve). 


where (V,), denotes the set of all points of the space T, at 

which E, < z, and V(s-z,), is defined in an analogous fashion. 
Since the phase function V(x — E,) of the first component 

depends only an its energy E, , by formula (18) wa have 


Via) = f Vale — E)E) dE, 
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and since V,(z — E,) = 0 for E, > z , the integration can 
be extended to infinity so that 


V(z) = f - V(x — y)Q,(y) dy. 


Finally, on differentiating this with respect to z, we have 


(20) N) = f ay) Ce — 9) ay. 


This is the law of composition of structure functions which 
we intended to establish. 

All that has been said above, without any modifications, can 
be extended to the case where the given system consists not 
of two, but of any number of components. The definition of 
component remains unchanged. As before, the space T is the 
direct product of the phase spaces of all the components. For 
the law of composition of structure functions, in case of n 
components, we have the formula 


n=1 n=l 

(21) Q(z) = f { I] ,(z;) —2 = 2 zı), 

where the integration is extended over the whole space of 
(n — 1) dimensions (or over the domain z; > 0,1 <i<n—1, 
which is the same, since 2;(z,) = 0 for z; < 0). To derive this 
formula it is simplest to use the method of complete induction 
from n to (n + 1), by decomposing the n-th component in 
(21) into two components, and by expressing the last factor in 
terms of structure functions of these two components, using 
formula (20). 

To conclude these brief preliminary considerations, we re- 
mark that the conception of decomposition of the system into 
components, leads to a specific methodological paradox, as has 
been observed several times. As stated already at the beginning 
of Chapter I, with all the generality and abstractness of the 
hypotheses of the statistical mechanics, it is invariably assumed 
that particles of the matter are in a state of intensive energy 
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interaction, where the energy of one particle is transferred to 
another (for instance by means of collisions). As we shall see 
in more detail later, the statistical mechanics bases its method 
precisely on a possibility of such an exchange of energy be- 
tween various particles constituting the matter. However, if 
we take the particles constituting the given physical system 
to be the components in the above defined sense, we are ex- 
cluding the possibility of any energetical interaction between 
them. Indeed, if the Hamiltonian function, which expresses the 
energy of our system, is a sum of functions each depending only 
on the dynamic coordinates of a single particle (and repre- 
senting the Hamiltonian function of this particle), then, clearly, 
the whole system of equations (1) splits into component sys- 
tems each of which describes the motion of some separate 
particle and is not connected in any way with other particles. 
Hence the energy of each particle, which is expressed by its 
Hamiltonian function, appears as an integral of equations of 
motion, and therefore remains constant. 

The serious difficulty so created is resolved by the fact that 
we can consider particles of matter a3 only approximately 
isolated energetically components. There is no doubt that a 
precise expression for the energy of the system must contain 
also terms which depend simultaneously on the energy of 
several particles (mutual potentials of particles), and which 
assure the possibility of an energetical interaction between the 
particles (from a mathematical point of view, prevent the 
splitting of the system (1) into systems referring to single 
particles). But, inasmuch as forces of interaction between the 
particles manifest themselves only at very small distances, such 
mixed terms in the expression of energy, representing mutual 
potential energy of particles, will be (in the great majority of 
points of the phase space) negligible as compared with the 
kinetic energy of particles or with the potential energy of 
external fields. In particular they will contribute very little in 
evaluating various averages, hence in the majority of compu- 
tations in statistical mechanics we will be able to neglect such 
terms, and, to a good approximation, assume that the energy 
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of the system is equal to the sum of the energies of constituent 
particles, these thus appearing as components of our system in 
the above defined sense. However, these mixed terms which 
are neglected, from the point of principle play a very important 
Tole, since it is pretisely their presence that aasurea the possi- 
bility of an exchange of energy between the particles, on which 
is based the whole method of the statistical mechanics. 


CHAPTER IIl 


ERGODIC PROBLEM 


9. Interpretation of physical quantities in statistical me- 
chanics. The values of physical quantities which characterize 
the state of the system we are studying are determined uniquely 
by this state, which, in turn, is described in our theory by 
the set of the dynamic coordinates. Thus a physical quantity, 
as a rule, appears ss a function of the dynamic coordinates 
of the system, or, what amounts to the aame thing, a function 
of a point of its phase space aa its phase function. Therefore, 
if, we wish to compare the deductions of our theory with the 
experimental data from measurements of various physical quan- 
tities, we will compare the values of various physical quantities 
found experimentally with the values of the corresponding 
phase functions furnished by our theory. However, such a 
statement of the problem leads immediately to a series of 
methodological difficulties which threaten to leave this problem 
without any content. The point is that the phase functions of 
the system in general represent quantities which assume widely 
distinct values for different states of the system. In order to 
compare these values with experimental data we should have 
a possibility of determining the state of the system at the time 
of the experimental measurement, that is, to determine the 
values of all the dynamic coordinates for this time. For in- 
stance, in the case of a gas, this would mean to determine at 
least the positions and the velocities of all constituent mole- 
cules, a problem which obviously is insoluble. If we forsake 
this idea, then what states of the system we should assume in 
order to compute those values of the phase functions which 
will have to be compared with the experimental data is an 
entirely open question. 

The following considerations will allow us to alleviate to a 
certain extent the acuteness of this difficulty. An experiment 
ar an observation which gives the measurement of a physical 
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quantity is performed not instantaneously, but requires a cer- 
tain interval of time which, no matter how small it appears to 
us, would, as a rule, be very large from the point of view of an 
observer who watches the evolution of our physical system. 
This system will be subjected during this interval of time to 
various perturbations (such aa mutual collisions of molecules) 
which may change essentially the values of the corresponding 
phase function. Thus we will have to compare experimental 
data not with separate values of phase functions, but with their 
averages taken over very large intervals of time. In other words, 
according to what was said in the preceding chapter, with time 
averages of phase functions over a trajectory which represents 
the evolution of our physical system. 

This consideration of course changes the picture quite con- 
siderably, but, at the same time, immediately introduces new 
difficulties. The first of these arises from the fact that the time 
averages of a given phase function taken over a given trajectory 
may have widely distinct values for different time intervals. 
This difficulty is alleviated considerably by the theorem of 
Birkhoff, which states that, for almost all trajectories, the time 
averages of the given phase function, which tend to a definite 
limit when the time interval tends to infinity, will assume 
approximately the same value for all time intervals, sufficiently 
large. It is therefore natural to take this limit as the time 
average furnished by our theory. 

There is, however, another difficulty which is much harder 
to overcome, namely, that we cannot determine which tra- 
jectory in the phase space is traversed by our system. If this 
system has a degrees of freedom (where s, as a rule, is a very 
large number), in order to determine this trajectory we would 
need to find values of (2s — 1) integrals of the system, which 
do not depend on time, while actually wa can determine ap- 
proximately values of only very few of these integrals. (The 
value of the energy of the system almost always is considered 
to be given.) The determination of any integral gives ys in the 
phase space a surface which contained the trajectory in men~ 
tion. If we know the values of k of such integrals, then we know 
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that our trajectory belongs to a certain “reduced” manifold of 
(2s — k) dimensions, so that for k = 2s — 1 the trajectory will 
be completely determined; if however, a3 usually happens, we 
know only one integral of energy, then k = 1, and the only 
thing we can say about the trajectory, is that it belongs to a 
manifold of (2s — 1) dimensions (surface of constant energy). 

There is however a casa where this difficulty does not exist, 
in view of the theorem of section 6. If the given surface at the 
constant energy is metrically indecomposable, then the time 
averages of any summable function are the same for almost all 
trajectories and coincide with the phase average of this func- 
tion on the given surface of constant energy. In this case every 
physical quantity receives a definite interpretation in our 
theory as the phase average of the corresponding phase func- 
tion, and the above mentioned difficulties no longer exist. 

Actually, in all expositions of the statistical mechanics, this 
phase average is taken as a theoretical interpretation of any 
physical quantity. In doing so either no arguments at all are 
given in favor of such a choice, or a special hypothesis is con- 
structed in order to justify this choice, or, finally, various 
reasons are cited in favor of such an interpretation, indicating 
at the same time that these reasons are not logically obligatory 
and that the interpretation was generally accepted in view of 
the successful results to which the theory based on this in- 
terpretation leads. The last method appears to us most prefer- 
able scientifically, and, in the following paragraphs of this 
chapter, we shall attempt to discuss in detail the most im- 
portant questions pertaining to the subject, from the point of 
view of modern ideas. 

At present we remark in addition that, in view of what has 
been said above, the task of a mathematical justification of the 
statistical mechanics reduces essentially to two problems. The 
first of these two problems, is to investigate us exhaustively na 
possible, under what conditions and to what degree the time 
averages of phase function, which, as we have seen, appear ag 
a natural interpretation of experimental measurements, can be 
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replaced by the phase averages of the same functions. The 
desirability, and even inevitability, of such a replacement is 
clear: The computation of time averages requires the knowledge 
of trajectories, that is, the complete integration of the equa- 
tions of motion and determination of all the constants of 
integration, which of course cannot be done for systems con- 
sidered in statistical mechanics, with their large numbers of 
degrees of freedom. As said before, we shall discuss the question 
connected with this first problem in the following paragraphs 
of the present chapter. 

The second problem which will be considered in the next 
chapters, is to create a general method for approximate compu- 
tation of phase averages or surfaces of constant energy. The 
evaluation of phase averages, contrary to the evaluation of 
time averages, is m problem completely accessible to a mathe- 
matical analysis, although it involves certain difficulties. This 
problem is always formulated as a problem of constructing a 
general method which would allow us to derive sufficiently 
simple asymptotic formulas for phase averages, under the as- 
sumption that the number of degrees of freedom of the given 
system increases beyond limit. Since statistical mechanics deals 
with systems with very large degrees of freedom, we may 
expect that such asymptotic expressions will be sufficiently near 
to precise values of phase averages. 


10. Fixed and free integrals. The problem of a theoretical 
justification of the replacement of time averages by phase 
averages, is usually called the ergodic problem (sometime this 
terminology is used for other related problems.) Almost always, 
ana considers the averages of phase functions on a given 
surface of constant energy. Therefore, in attempting to give m 
short account of the history and of the present status of the 
ergodic problem we first should try to understand why in our 
theory we choose precisely these phase averages. From a purely 
theoretical point of view, such a choice at first glance, appears 
casual and arbitrary. Usually such æ concept of phase averages 


is justified by the following argument. Since the energy is an 
integral of the equations of motion, each trajectory is situated 
entirely on some surface of constant energy, 2, . The values 
of the function under consideration, at the points of the phase 
space P which are not on this surface Z, , play no role in 
evaluating the time averages, and therefore should not be taken 
into account in evaluating the phase averages, if we desire that 
these phase averages be near the time averages. 

Such an argument contains a vulnerable point. Everything 
which is said therein about the energy integral can be re- 
peated, word for word, for any other integral of motion, which 
does not depend on time. Since, for a system with a degrees 
of freedom, there are (2s — 1) of such independent integrals, 
we should fix the values of each of them beforehand, or, in 
other words, determine the trajectory of the system in the 
phase space, and evaluate our averages along this trajectory. 
This, however, is never done, and is not feasible to do, because 
the great majority of other integrals of motion are not known, 
eo that we cannot determine the trajectory which represents 
the evolution of our system. 

Thus the whole question requires more careful consideration. 
It will be most convenient for us to start by making morc 
precise the above argument in favor of a preliminary specifica- 
tion of a surface of constant energy £, . In itself this argument 
is not only entirely convincing, but serves as u starting point 
of our discussion. 

Suppose that we do not specify the surface 2, , but try to 
evaluate the phase averages of our functions over the whole 
space I’. The first, and comparatively non-essential difficulty 
here is due to the fact that this space has an infinite volume, 
Bo that averages of simplest functions would become infinite 
or undetermined, if we would not introduce a preliminary 
weighting of the space with the purpose of diminishing the 
contributions by distant portions of the space. However, such 
a weighting of various parts of the space T would necessarily 
introduce some element of arbitrariness, which would make the 
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computation of phase averages based on this weighting, same- 
what doubtful.’ 

However, this difficulty, as we have observed before, is not 
essential in comparison with the other difficulty which ap- 
parently makes the whole method completely useless. Indeed, 
the energy of the given system is o phase function, and un- 
doubtedly, one of the most important ones. Our method should 
attribute to it some definite average value E, as for any other 
phase function. But what physical meaning could this average 
have? In particular, could we expect that the time average of 
the energy of the given system will be equal to E (or at least 
near to E) in the majority of the evolutionary processes of 
which the system is capable? It is sufficient to formulate this 
assumption to understand its absurdity. In each evolutionary 
process the given isolated system preserves a constant value of 
its energy. This constant value we can select, in general, 
arbitrarily over a rather wide range, and in different cases we 
can select different values which are quite far from any fixed 
number prescribed by the theory. The very attempt to attribute 
to the energy of our system any fixed value, no matter by what 
method this value is computed, contradicts reality. And so, the 
preliminary reduction of the phase space to some surface at 
constant energy appears really inevitable for any efficient evalu- 
ation of the phase averages. 

Let ua now investigate why we may pay no attention to all 
other integrals independent of time and treated in the same 
way as we have treated the energy. Such claims, as we have 
already observed before, appear well founded, at least at first 
sight. However, a more attentive consideration will show that 


1Here the question is ono of introducing weights “universal” for the 
given system. A weighting adjusted to a definite value of the energy (or of 
some other integral) as is done, for instance, in the so-called canonical 
distribution of Gibbs (see Chapter V, section 25) is equivalent to a prelim- 
inary specification of the surface Xs, and therefore la not of interest to us 
at present. 
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the situation is different. For better understanding we shall 
split our argument into several steps. 

1. As we shall see later, the majority of physical phase func- 
tions with which we have to deal in statistical physics have a 
specific structure which makes the values of such a function, 
defined on every surface 2, , very near each other at all points, 
except for a set of a very small measure. This implies that for 
the majority of the trajectories situated on Z, , the time 
averages of such a function will have values very near each 
other, and therefore near the phase average of the function 
over the surface =, . 

2. Let now I be any integral of the equations of motion of 
the given system which does not depend on time and is dis- 
tinct from the energy integral. If, considered as a phase func- 
tion it has a structure described under 1, then the possibility 
of replacing the time average by the phase average for this 
integral does exist. If, however, J does not have such a structure, 
then the phase function which it represents, as a rule, will not 
have actual physical interpretation, and therefore the rela- 
tionship between its various averages will not present any 
interest for us. 

3. It is possible however that in some cases the arguments 
of 2 cannot satisfy us even if they are quite correct. Such cases 
occur when the integral J represents a physical quantity which 
plays a role analogous to the role of the energy, that is m 
quantity for which we are able to select a value arbitrary 
within certain limits, by regulating the conditions of our 
process, or at least are able to determine its value experiment- 
ally. For instance, let I be the phase average of the function J 
over the surface 2, . In view of what has been said in 1 and 2, 
the time_averages of the function J for most trajectories will 
be near J. But if, for various reasons, we forced I to sasume 
a value J, which is far from J, or if we have found such a value 
by experimental measurement, then of course we have to 
attribute to J the value J, , not J. The fact that in most cages 
I is near J, cannot force ua to accept this relation if we know 
that in our case (whether due to our interference or not) the 
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value Z is far from I. Furthermore, if we know the actual value 
of I, we will have to take it into account in computing values 
of other phase functions. In other words, in such a case the 
value of the integral J could and should be specified beforehand, 
just as has been done with the integral of energy.” 

To abbreviate, let ua call the integrals of type described 
above—controllable integrals (because we can either select, or 
determine experimentally, its value; in other words we control 
its value in the process we consider). Let our system have k 
such controllable integrals (they will amost always include the 
integral of energy). If we fix the value of each of them in our 
process, we shall specify in the phase space of our system a 
certain reduced manifold of 2s — k dimensions, over which we 
will have to take the phase averages of the phase functions in 
which we are interested. In the great majority of cases dealt 
with in the statistical physics, the only controllable integral is 
the energy integral, so that the reduced manifold will be only 
the surfaces of constant energy 2, . There are, however, cased 
where, simultaneously with the energy integrals, some other 
integrals become controllable (for instance integrals of the 
momentum components). In such casea the phase averages are 
actually taken over the manifolds of smaller number of dimen- 
sions, which are obtained by fixing the values of controllable 
integrals. 

As concerns the remaining free (that is, not fixed) integrals, 
each of them, if it represents an actual physical quantity, will 
be almost constant, in the above described sense, on the 
reduced manifold, as stated in 2. This gives us a certain reason 
to expect that its value will be ncar its phase average on the 
reduced manifold, in the majority of cases met in practice. In 
other words, we assume that the location of the image point 
of a system an the reduced manifold is a random event such 
that a very small probability corresponds to the location of the 


*An excellent example of how radically all the results of computations 
could be changed by fixing the value of such an integral, is given by the 
statistical schemes of Bose-Einstein and Fermi-Dirac in quantum physics. 
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point in a set of very small measure (absolute continuity!). 
Hence it is almost certain that our integral assumes values near 
to its average, in the majority of experimental measurements. 
Of course, the question of correctness of all this hypothetical 
construction san be ultimately decided only by a comparison of 
the deductions of our theory with the experimental data. 

The fact that the distinction between the fixed and free 
integrals is determined not by their mathematical nature, but, 
go to say, by their role in our scientific or practical experience, 
should not at all disturb a mathematician. It is typical in all 
applications of the theory of probability. For instance, under 
normal conditions we consider the number of tickets drawn in 
a lottery aa a random variable. However, if we succeed in 
studying the mechanism of the drawing to such extent that we 
shall be able to determine this number beforehand, or, still 
more, if we succeed in drawing the number aa we desire, then 
all elements of randomness disappear, although the mechanism 
of drawing is the same in both cases. 

In what follows we shall consider as a reduced manifold of 
the phase space the surface of constant energy, which corre- 
sponds to the actual situation for the majority of systems dis- 
cussed in the statistical physics. Thus it will be our purpose to 
collect as many arguments aa possible in favor of the fact that 
the time averages of physically most important phase functions, 
for the great majority of trajectories situated on the given sur- 
face of constant energy have values which are close to each 
other (and therefore, necessarily, near the values of the corre- 
sponding phase averages). 


11. Brief historical sketch. As we have indicated already, 
many authors attempted to prove the coincidence of the time 
and the phase averages by introducing various special hy- 
potheses, more or less plausible. Such hypotheses usually were 
called “ergodic hypotheses”. The first of them was stated by 
Boltzmann, who also waa the first to use the terminology. 
Boltzmann, conjectured that each surface of constant energy 
consists of a single trajectory. In other words, no matter what 


is the state of our system at given time, it will pase (or has 
already passed) through any other state with the same value 
of the total energy. 

Using this conjecture it is possible to establish the coincidence 
between the time and phase averages on each surface of con- 
stant energy. However, the conjecture itself is logically contra- 
dictory, which soon was found out, and which at present is 
topologically obvious, since no trajectory can have multiple 
points and therefore cannot fill out the whole many 
dimensional space. 

After this failure attempts wore made for a long time to 
replace the ergodic hypothesis of Boltzmann by the “quasier- 
godic” hypothesis, according to which every trajectory, 
although not filling completely the surface of constant energy 
on which it is situated, constitutes an everywhere densepoint 
set (that is intersects every element of the surface). However, 
even if we disregard the fact that the logical compatibility of 
this hypothesis has not been established, nobody succeeded in 
proving on this basis the possibility of replacement of the time 
averages by the phase averages. Numerous expositions of such 
proofs contain grave mistakes. Those authors (as, for instance, 
P. Kertz in his known treatise) who do not wish to base their 
proofs on false arguments, have to introduce several additional 
hypotheses. 

All this history of the ergodic problem appears to ua in- 
structive since it makes the efficacy of introducing various 
hypotheses which are not supported by any argument very 
doubtful. As is usual in such cases, when we are not able to 
submit really convincing arguments in favor of replacing the 
time averages by the phase averages, it is preferable, and also 
simpler, to attempt as the “ergodic hypothesis” the very possi- 
bility of such a replacement, and then to judge the theory 
constructed an the basis of this hypothesis, by its practical 
success or failure. This, of course, does not mean that the 
theoretical justification of the accepted hypothesis is to be 
forgotten. On the contrary, this question remains one of the 
most fundamental in the statistical mechanics. We wish only 
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to say that the reduction of this hypothesis to others is little 
justified, and does not appear to us to be very efficient. 

After several decades of almost fruitless discussions in con- 
nection with the ergodic problem, it was only in 1931 that 
the theorem of Birkhoff revived the problem.* From our point 
of view the essence of these new researches consists in em- 
phasizing the notion of the metrical indecomposability and 
in establishing its close connection with the ergodic problem. 
Quite often one hears, particularly from the side of physicists, 
that Birkhoff’s results do not give anything for the solution of 
the ergodic problem, but only reduce it to another problem— 
the justification of the metric indecomposability of the surfaces 
of constant energy, and, in this sense, are similar to introducing 
ergodic hypotheses as done by earlier authors. Although we do 
not wish to overestimate the role of Birkhoff’s results for the 
foundation of the statistical mechanics, we cannot share such 
a point of view.** The enormous, interesting, and significant 
literature which has developed on the basis of Birkhoff’s re- 
searches during the last decade shows that these researches in 
the ergodic problem shed light an new problems which re- 
mained unknown beforehand, and discovered a most fertile 
tield for new researches. In all mathematical justifications of 
various special fields usually there occur moments when, 
although it does not solve any concrete problem, the introduc- 
tion of rome appropriate notions coordinates and organizes the 
whole problem in such a fundamental way that the work of an 
investigator is turned from a chaotic and almost helpless 
wandering into a sensible and planned conquest of uew scientific 
facts. There is every reason to believe that the researches of 


*Footnote of the translator. The first form of an ergodic theorem was a 
somewhat weaker statement proved by J. v. Neumann shortly before 
Birkhoff. 

**Footnote of the translator. In fact, m general theorem showing the 
existence of ergodic transformations on quite general manifolds (or phase 
spaces) waa proved by I. Oxtoby and S. Ulam. (Ann. of Math. 42, 874 
(1941)). Their results imply that in æ certain sense almost every continuous 
transformation is metrically transitive. 
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Birkhoff represent such a moment in the development of the 
ergodic problem. In the following paragraphs we intend to state 
some observations concerning the present status of the ergodic 
problem. 


12. On metric indecomposability of reduced manifolds. As 
we know from section 6 of chapter II the metric indecompos- 
ability of the given surface of constant energy insures that the 
time averages of any summable function f(P) along almost all 
trajectories situated on this surface coincide with the phase 
average f of this function over this surface. But it is easy to 
seo that if we require this property of almost all trajectories 
and all summable functions, then the condition of metric in- 
decomposability is also necessary. Indeed, if the given surface 
of constant energy is metrically decomposable, then it can be 
split in two invariant sets M, and M, , each of which has a 
positive measure. The summable function (P) which assumes 
the value 0 on M, , and the value 1 on M, cannot have the 
game time averages along almost all trajectories (these time 
averages are either 0 or 1, while the phase average has some 
intermediary value). 

Thus we see that the metric indecomposability of the sur- 
faces of constant energy is a necessary and sufficient condition 
for a positive answer to the ergodic problem stated in a certain 
precisely determined sense. This fact alone shows the essential 
advantage of the investigations of Birkhoff ss compared with 
the introduction of old “ergodic hypotheses”. 

We now pass an to the question of what general considera- 
tions may make the metric indecomposability of surfaces of 
constant energy more or less plausible. Let ¢ = (qi , °** » Ds) 
one of the “free” integrals of the equations of motion of the 
given system, that is, an integral independent of the integral 
of energy and not containing time explicitly. If the function ¢ 
remained constant on each surface of constant energy, then the 
value of the integral y would have been uniquely determined 
by the value of the integral of energy, and these two integrals 
would not be independent, contrary to our assumption. Hence 


the function ¢ cannot remain constant on each surface of 
constant energy, and, being continuous, cannot remain constant 
at almost all points of such a surface. 

Let ua take a surface of constant energy on which the func- 
tion » is not constant almost everywhere. Then we can find 
such a real number a that each of the two parts of this surface, 
characterized respectively by the inequalities p > a and 
ø < a, will be of positive measure.” But since ¢ is an integral 
of the equations of motion of our system, each of these two 
paits is an invariant set. This shows that our surface of con- 
stant energy cannot be metrically indecomposable. 

This elementary argument leaves a impression that the 
metric indecomposability of surfaces of constant energy is a 
hypothesis, which, like the Boltzmann hypothesis, never can 
be realized, and therefore should be rejected. However, tbis 
would mean a complete solution of the ergodic problem in the 
negative genge. 

Formally, there sre no objections against the above argu- 
ment, and we actually have to agree that the answer to the 
ergodic problem at least in the form in which it waa formulated 
above, should be in the negative. We shall ace however, that if 
we introduce eome sensible and natural modifications in the 
formulation of the problem, wo may obtain a positive answer, 
at least in some cages. 

So far it waa always self-evident, although not stated ex- 
plicitly, that two distinct points of the phase space represent 
two distinct states of our mechanical system. Actually, how- 
ever, in many cases, to distinct points of the space I may 
correspond identical states of the mechanical system. 

Let us explain this. In many ceases we are forced to charac- 
terize the same physical state of the system not by one, but 
by several sets (sometimes even by infinitely many) of values 
of its dynamic coordinates. Thus for a point which moves uni- 
formly along a circumference, if we determine its position by 
the central angle counted from some fixed radius, we must 


*For the proof ama the footnote an p. 30. 
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consider as identical the states for which the values of this 
angle differ by a multiple of 2r. 

On the other hand it is obvious that every physical quantity 
which characterizes the state of the given system must be 
determined uniquely by this state. The phase function which 
interprets this physical quantity in our theory, must therefore 
ussume the same value at any two points of the phase space 
corresponding to the same state of the given system. We shall 
call normal every phase function which satisfies this condition. 

Since, in view of the preceding considerations, all physical 
quantities for which there may arise the question of comparison 
of their theoretical values with experimental data are inter- 
preted by normal phase functions, we will lose nothing if in 
formulating the ergodic problem we shall state that not all 
but only normal summable functions should satisfy its require- 
ment. Then the condition of metric indecomposability will 
oease to be necessary; it will be replaced by a broader necessary 
and sufficient condition which can be easily formulated. 

We shall call normal every subdivision of the given surface 
of constant energy in two invariant paths of positive measure, 
such that all points of the surface which correspond to the same 
state of the system (we shall call such points physically equiva- 
lent) belong to the same part of the surface. A surface which 
does not admit of a normal subdivision we shall call metrically 
indecomposable in the extended sense. 

Theorem. In order that time averages of any normal summable 
function taken along almost all trajectories situated on the given 
surface of constant energy, would coincide with the phase average 
of this function over the given surface, it is necessary und sufficient 
that this surface is metrically indecomposable in the extended 
sense. 

A proof of this theorem can be carried through in complete 
analogy with the arguments of section 6 of the preceding 
chapter (sufficiency) and of the beginning of the present para- 
graph (necessity). We leave it to the reader. 

After this modification the ergodic problem is reduced to the 
question of whether, generally speaking, the surfaces of con- 
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stant energy of the mechanical systems under consideration are 
metrically indecomposable in the extended sense. First we 
shall show that the argument which was used above in estab- 
lishing the impossibility of the metric indecomposability in the 
original sense, does not give anything directly if we assume 
the metric indecomposability in the extended sense. 

Indeed, in this argument we subdivided the given surface 
of constant energy into two parts, placing a point in the one 
or the other of these parts according to the value assumed at 
this point by a certain integral y. But if now we are interested 
only in the normal subdivisions associated with a given point 
all physically equivalent points must belong to the mams part. 
If » is a normal integral, that is, assumes the seme value at 
all physically equivalent points, our argument remains valid. 
But if our integral ¢ is not normal, then, in determining the 
sets M, and M, we cannot start by arbitrarily subdividing the 
set of all values assumed by the integral y in two parts. If 
we want the subdivision (M, , M,.) of the surface 2, to be 
normal, we must see to it that the values assumed by ¢ at any 
two physically equivalent points are always placed in the same 
part. This requirement (as we shall see in an example) may 
turn out to be incompatible with the requirement that M, and 
M, be invariant sets of positive measure. In such a oase our 
argument becomes invalid, and the question of possibility of 
metric indecomposability in the extended sense remains open. 

Later we shall give the simplest of known examples of such 
a situation. Now we note that the above argument shows the 
impossibility of metric indecomposability even in extended 
sense, if among the free integrals there exists at least one normal 
integral. In particular, the energy integral being always normal, 
necessarily has to be fixed. If the system has no other normal 
integrals of motion, then we can raise the question of the metric 
indecomposability in the extended sense on the surfaces of 
constant energy. 

Let ua turn now to an example of the metric indecompos- 
ability in the extended sense. Consider a system with two 
degrees of freedom, whose situation is determined by two cyclic 


coordinates p, Y, with a period 1. This means that for any 
integers k and l, the pair of coordinates o + k, y + I represents 
the same situation of the system as the pair y, Y (motion of a 
particle on the surface of a focus). The Hamiltonian function 
we take to be 


H = (1/2)(¢” + 4”), 


where ¢’, Y’ are dynamic coordinates canonically conjugate to 
¢, Y. If we denote by a dot the differentiation with respect to 
time, we can write the canonical system of the equations of 
motion in the form 

g=9', y= V', g = 0, vy’ = 0. 

Three independent integrals which do not contain time ex- 
plicitly are given by the functions 

g, v, oy’ = ve". 

The first two of these integrals are normal (since two phy- 
sically equivalent points cam differ by integer values of the 
variables y and y but ¢’ and y’ will have for them the same 
values). The third integral is not normal. Indeed, let J be 
the value of this integral for some state of the system (¢, Y, 
¢’, Y’). For any integers k and I the point (p + k, y + J, 
¢’, Y’) represents the same state of the system, but the value 
of the third integral at this point is I + ky’ — ly’, which in 
general is different from J. Furthermore, if the values ¢’ and 
y’ are incommensurable, the third integral assumes an every- 
where dense set of values for the same state of the system. 

According to what was said above, if we desire to construct 
a reduced manifold metrically indecomposable in the extended 
sense, we have to fix the values of the integrals y’ and y’. 
Let y’ = a, Y = B, where æ and 8 are any two incommen- 
surable real numbers. The reduced space (¢, Y) will be then a 
two-dimensional part of the four-dimensional phase space, and 
will be also @ part of the surface of constant energy E = 
(1/2)(a’ + 6”). Let ua investigate whether this plane (¢, ¥) 
admits of normal decompositions. 
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Since every square 
(22) kK<e<k+1, l<y<l+1, 


where k and I are any integers, is physically equivalent to any 
other such square, and since in a normal subdivision all 
physically equivalent points must belong to the same part for 
each normal subdivision of the plane (y, y) all such squares 
will be subdivided in parts which ara mutually congruent. 

A little explanation is necessary here: the sets M, and M, 
which normally subdivide the plane (¢, V) clearly will have 
infinite measure, which is not foreseen by the definition of a 
normal subdivision. In our case, however, this cannot cause 
any difficulty, since, in view of the physical equivalency of any 
two squares of the type (22), to take the phase average of any 
normal phase function we could restrict ourselves to the con- 
sideration of the fundamental square 0 < ¢ < 1,0 <y <1. 
If we transfer any normal subdivision of this square to all 
squares (22), we obtain a subdivision of the plane (ẹ, y), which 
naturally may be called a normal subdivision of the plane 
(e, ¥). 

Let (M, , M,) be any such normal subdivision. Consider the 
side gy = 0 of the fundamental square. Let the point (0, b) 
(0 < b < 1) of this side belong to the set M, . We assert that 
in this case the point (0, p(b + k 8/a)),* where k is any integer, 
also belongs to M, . Indeed the set M, , being invariant, con- 
tains together with the front (0, b) the whole trajectory passing 
through this point, that is, all points of the type (at, b + pd, 
and, in particular (for # = k/a) the point (k, b + kB/a). 
But this point is physically equivalent to the point (0, p(b + 
kB/a)) of the fundamental square, and therefore, since our 
subdivision is normal, must also belong to M, . The numbers 
p(b + kB/a)(—2 < k <œ), if a and 8 are incommensurable, 
constitute an everywhere dense set, hence the set M, , is every- 
where dense on the side p = 0 of our square. Let A, be the set 
which is common to M, and this side. We can assert that if A, is 


‘Here p(z) = r — [z] is the “fractional part” of the real number z. 
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measurable, it must have positive (linear) mcagure. Indeed, if 
the measure of A, were equal to zero, then the part of M, in 
our square which consists of the family of parallel lines of slope 
B/a, passing through the points of A, would have been of 
plane measure zero. In view of the mutual congruency of sub- 
divisions induced in all squares, we would have MM, = O, 
contrary to our assumption. 

Let « > O be arbitrarily small, and 5(b, , b2) such an interval 
on the side p = 0 of the fundamental square, in which the 
average density of the set A, is greater than (1 — e), that is 


M-A) > (1 — Mê 


(such an interval exists in view of a known theorem concerning 
the density of measurable sets). After the time t = k/a the 
interval 5 passes over into an interval of the same length on 
the line y = k, which, in its turn, is equivalent to some interval 
(or a pair of intervals) 8’ of the side ¢ = 0 of our square, and 
we must have 


Ms’ = Mô. 
Since this set M, is invariant, 
M(s-A,) = M-A) 
Bo that 
M(E -A > (1 — ©) Ms’. 


Varying k we obtain, as it is easy to see, along the side py = 0 
of our square a dense (because of the irrationality of 8/a) 
set of intervals 6’ of equal length within which the mean 
density of the set A, exceeds 1 — e. It follows that: 


MA, > 1 — 
or, because of the arbitrary value of e: 
MA, = 1. 


Writing A, for a set complimentary to A, (on the side » = 0) 
we have: 


MA, = 0 
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As we have already seen this would lead to MM, = 0 
which contradicts our assumption. This argument shows that 
the plane (¢, , ¢2) is metrically indecomposable in the general 
senao of this word. The question as to whether this metric in- 
decomposability can be considered as the general property of 
the broad class of systems encountered in statistical physics 
cannot be answered at the present time. We notice however 
that many other authors succeeded in the construction of 
rather general examples of the above given type and gave 
arguments in favor of the generality of the above statement. 
We will not enter here into the discussion of these problems, 
but will turn to the analysis of those cases which are of greatest 
importance in statistical mechanics. 


13. The possibility of a formulation without the use of 
metric indecomposability. All the results obtained by Birk- 
hoff and his followers (as well as all considerations of the pre- 
vious section) pertain to the most general type of dynamic 
systems, and consider different problems connected with them. 
The authors of these studies have been working, as a rule, on 
the development of the so-called “general dynamics’’—an im- 
portant and interesting branch of modern mechanics. They 
have not been interested in the problem of the foundation of 
statistical mechanics which is our primary interest in the 
present book. Their aim was to obtain the results in the most 
general from; in particular all these results pertain equally to 
the systems with only few degrees of freedom aa well as to the 
systems with a very large number of degrees of freedom. 

From our point of view wa must deviate from this tendency. 
We would unnecessarily restrict ourselves by neglecting the 
special properties of the systems considered in statistical 
mechanics (first of all their fundamental property of having a 
very large number of the degrees of freedom), and demanding 
the applicability of the obtained results to any dynamic system. 
Furthermore, we do not have any basis for demanding the 
possibility of substituting phase averages for the time averages 
of all functions; in fact the functions for which such substitution 


is desirable have many specific properties which make such a 
substitution apparent in these cases. In the present section we 
shall make several elementary remarks along these lines. 

In the field of statistical mechanics we are, first of all, 
helped by the fact that the majority of phase functions de- 
scribing the most important physical quantities exhibit a very 
peculiar behavior (compare section 10). In fact these functions 
are, aa a rule, approximately constant on the surfaces of con- 
stant energy, i.e., with the exception of a set of points of a 
very small measure, they possess on each such surface values 
which are very close to n certain number characteristic of the 
surface. The reasons for such peculiar behavior will be partially 
discussed later in this chapter, and we will return to them in 
more detail in the later chapters. We will remark here, how- 
ever, that these reasons arise partially from the peculiar prop- 
erties of mechanical systems treated in statistical physics 
(breaking up into a large number of components), and partially 
from the specific properties of the functions with which we are 
dealing (these are, as a rule, the “‘sum-functions’, i.e., the 
sums of functions each depending on the dynamical coordinates 
of only one component). It is clear without calculation that, 
for such functions, the time averages taken along most tra- 
jectories must be very close to the corresponding phase aver- 
ages. If derivable, however, the approximate proof of the above 
statement can be given along the following lines: 

Let us assume that the values of the function f(P) on the 
surface 2, (except in a set of points of m very small measure) 
are very close to a certain number A. Then, unless f(P) assumes 
at this small set of points some particularly large values, the 
quantity 


1 dz 
dod - Alas! 


will in general be small. Assume for simplicity A = 0, which 
apparently does not reduce the generality of our considerations. 
Assume also, as we did before, that 


ral Č AP, t) dt = fe(P), lim fo(P) = JP); 


finally let M.a be a set of points on the surface 2. for which 
| AP) | > a, and M$ a set of points for which | fe(P) | > a/2. 
Since under the condition C —>~ (on the surface 2,), fe(P) — 
f(P), we can write for sufficiently large C: 


MMe > MM. Å 


consequently: 





aMMa d2 
4 = i | fe(P) — 





<f afa IJP, 9 — 
-1f at. a | FP) F< 


<i fo af 1s) 15 = M0, 


from which follows: 


if for example we choose a = I'”” we obtain: 


MM r7» 


ma) < 4n"? 


‘To prove this inequality saeume that MS is a set complementary to M$ . 
If P E Ma and P € MS, we evidently have: | fe(P) — f(P) > a/2. 
Hence, because of convergence in measure under the condition C - œ, we 
have M(Ma: M) — 0, or N(Ma: ME) > MM, . From this it follows that, 
for sufficiently large C, M(Ma'MS) > 34MM, , or a fortiori MMS > 
IMM, . 
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60 that the relative measure of the set of points on the surface 
Z, for which | f(P) | exceeds the small quantity Z™? is smaller 
than the small quantity 4(I)’””. It is clear that in order to 
reach the practical conclusions from the above calculation, we 
must estimate in each particular case the order of magnitude 
of the small quantity J. In many cases such an estimate is 
actually possible. However, it is also possible to make some 
estimates of quite a general nature. Thus, for example, we will 
eee in the following chapters that for a physical system formed 
by n molecules, the most important phase functions are of the 
order of magnitude n. The “dispersion” of such a function, 
i.e., the quantity 
— 42 
Y= ag h UP) ~ AY aaa 

has also, ss a rule, the order of magnitude n (Chapter VIII, 
Sec. 36). Since, because of the Schwartz inequality J < 
(I) = O(n’), wa find, choosing a = 1°” (order of mag- 
3$), that the relative measure of the set of points for 





nitude n 
which 


| JP) — A|> Kn' 
E-E 


> Kin 





is a small quantity of an order of magnitude not less than 
IJa = O(n™*) (K and K, being positive constants). Since the 
quantity A can be assumed to represent the phase average f of 
the function f(P) on the surface 2, , the above considerations 
supply certain approximate qualitative estimates pertinent to 
the substitution of phase averages by time averages. These 
almost trivial considerations lead us to suppose that, at least 
in the fundamental problems of statistical mechanics, and 
especially for practical purposes, we can avoid the uxe of the 
ergodic theorems of general dynamics. 

We make ona morn remark. In this as well oz in the previous 
sections, we have been satisfied with such situations in which 
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the desirable phenomenon was taking place at all points of the 
surface X. except at a set of points with a very small measure 
(sometimes exactly zero). It is clear that, taking this point of 
view, we make the definite assumption that, if some collection 
of the states of the system is represented on the surface 2, 
by a set of points of a very small relative measure, then the 
states belonging to this collection appear very infrequently in 
practice. 

The exact mathematical formulation of this assumption in 
terms of the theory of probability is as follows: considering the 
different states of the system (i.e., of the points of the surface 
2.) a8 random events, we assume that they are subject to any, 
not necessarily absolutely continuous, distribution law (i.e., 
such a distribution law for which the collection of very small 
measure possesses a very small probability). Such an assump- 
tion is in fact absolutely unavoidable in any comparison of 
our theory with reality. As a working hypothesis it is quite 
natural and gives us a free hand in selecting free distributions 
appropriate in practice. 

Let us consider, finally, one more simple argument pertinent 
to the same group of ideas. Let us call a summable function 
f(P) ergodic if, for almost all trajectories, f(P) = f. As re- 
marked before, most of the phase functions considered in 
statistical mechanics are of the ‘“‘summable type” i.e. represent 
the sum of functions each one depending only on the coordi- 
nates of a single molecule. It is clear that such a function will 
be ergodic if each of its components is also ergodic, since the 
averages f and f are both linear transformations. Thus, to 
prove the ergodic nature of such functions, it suffices to prove 
it for the functions corresponding to single molecules. We will 
give now some considerations in favor of the above statement. 

Let f(P) be a function of coordinates of only one molecule. 
Without restricting the generality of the argument, assume 
f = 0. Let M be the upper limit of the function |f| on the 
surface Z, , and 


Df = f(P), 
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R(u) = a ie, fe, t+ 0 


(it is apparent that the last quantity is independent of 2). 
Thus, Df is the phase dispersion of the function f(P), and 
R(u) is the phase coefficient of correlation connecting f(P, t) 
and f(P, t + u). Because of the fact that the given system 
consists of a very large number of molecules it is natural to 
expect that knowledge of the state of a single molecule at a 
certain moment does not permit us to predict anything (or 
almost anything) about the state in which this molecule will 
be found after a sufficiently long time. For example, the exact 
knowledge of the energy of a given molecule at a given moment 
of time cannot give us any indications concerning the value 
which this energy will have several hours later (because of the 
large numbers of collisions suffered by the molecule during this 
time interval). This statement seems to us so natural that it 
would be difficult to think otherwise; in fact, this represents the 
basic idea, of “molecular chaos”. Expressing this in terms of the 
theory of probabilities we can say that stochastic dependence 
between the quantities f(P, t) and f(P, t + u) decreases very 
rapidly with increasing u, and almost entirely vanishes for 
appreciably large values of u; in particular it means that R(u) 
must be small for large u and that R(u) — 0 for u — œ. The 
only bad point of the above argument which must be mentioned 
here is that, since R(u) is the phase coefficient of the correla- 
tion, we cannot be sure to what extent it can be used to 
characterize the stochastic dependence between the quantities 
in question. However, the relation R(u) — 0 (u — œ ) represents 
a well defined property of the function f(P) and of the natural 
motion on tbe surface 2, , —a property which must necessarily 
take place if the initial correlation between the quantities 
f(P, t) and f(P, t + u) becomes, “generally speaking”, nearer 
with increasing u. The expression “generally speaking” attains 
exact meaning in terms of the measure on the surface Z, , and, 
aa we have seen above, the stochastic interpretation of this 
measure represents the necessary postulate of the entire theory. 


From this point of view it is interesting to prove the fol- 
lowing theorem: 
Theorem: If R(u) — 0 for u >œ the function f(P) is ergodic. 


Proof: Assuming (as ee 


lim "He, t) dt = f(P) 


C=w 





-5 {re -A f f se DIP, du a) 


+ T z — G Gi f(P, wf(P, v) du dv. 


Writing Q for the above expression in brackets, and G, and 
G, respectively for the set of points on the surface £, for which 
| Q| < «and its complementary set 


1 f Qa, 1 f Qaz 
Ma) Jo, grad E T MA) Jo, grad E 





since 
Q = f(P) - i i} AP, D ay 


we conclude, that under the condition C >œ, Q — 0 almost 
everywhere on the surface Z, . Consequently MG, — 0 from 
which follows that for sufficiently large C: 


MG. 1 _ dz 
(a) ~ Qa) Jo, grad E E < 


since obviously 


as J. AP, DAP, a _ Df Rua 


grad E = 


and 


l\Q|<e (P EG) 
IQISM (PEG) 


we conclude that: 


JYP) < 





PEL” f Ru- o dudo| + e+ me 


If | R(a) | < efor |a| > a we get (taking into consideration 
that | R(a) | < 1 for any a) 


PE s 2A f au [| eu — 0) | a 


maz (0,u—ae) 
+b [awa + + M? 
Cc a 3 € € 
< PL 4 Df +1 +M’). 
Since we can choose a arbitrarily small and C arbitrarily large: 
Pe) =0 
so that almost everywhere on 2, 


iP =0=f. 
This proves the ergodic nature of the function of f(P). 


CHAPTER IV 


REDUCTION TO THE PROBLEM OF THE 
THEORY OF PROBABILITY 


14. Fundamental distribution law. We shall now consider 
the aggregate of 2s dynamic variables (£, , Z2 , -°* , Za); 
determining the state of a given system G with s degrees of 
freedom, 24 a multidimensional probability quantity (prob- 
ability vector). We shall assume as usual that the energy E of 
the system has a certain constant value a, so that all possible 
values of the probability vector (£, , Z2 , °°: , Z2.) correspond 
to the points on a certain surface Z, . The probability that the 
representative point of the system will fall within a certain 
set M on the surface Z, will be assumed to be given by: 


i f dæ 
Q(a) Jy grad E’ 


where the value Q(a) = fz. dZ/grad E of the surface 2, is the 
structure function of the system. It is obvious that the prob- 
ability field introduced in this way satisfies all necessary 
conditions. The distribution law of the probability vector 
(Tı , Z2, °** , Za.) thus established will be called in the future 
the fundamental distribution law of the system (for E = a). 
Let g(t: , Z2 , °** , Z2.) be an arbitrary measurable phase 
function of the system G. Then the probability of the inequality 


lTi , T3, hae , Za) <T, 


where z is an arbitrary real number, will be determined by the 
formula 


1 dz 


Pe <D = OG Jace Gad E 


Thus, any measurable phase function can be considered aa an 
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accidental quantity with a well defined distribution law. The 
mathematical expectation of this quantity, 


1 d2 
Ep = gg J, ati 1 U2) ad E’ 


coincides (under the assumption of absolute convergence of this 
integral) with the quantity which we called the phase average 
p of the function ¢ in section 7 of Chapter II. In particular, 
if is an ergodic function, its time average for almost any 
trajectory on the surface Z, coincides with its mathematical 
expectation Ey = g. 

If » is a function characteristic of a certain measurable set 
of points M on the surface 2, (i.e. if p = 1 within M and 
g = 0 without), Ey obviously gives the probability of finding 
the point (zı , 2, *** , 22.) within the set of points M. If » 
is an ergodic function this probability coincides with the 
relative mean time spent by the moving point within the set 
M for almost any trajectory located on the surface 2, . 

The fundamental distribution law formulated above permits 
us to introduce convenient probabilistic terminology for the 
ideas connected with evaluation of phase averages. At the 
same time, as we shall see later, this formulation of the funda- 
mental distribution law will permit us to use the well known 
analytical apparatus of the theory of probability for the solu- 
tion of many fundamental problems in statistical mechanics. 


15. The distribution law of a component and its energy. 
Let a given system G have a component G, with dynamic co- 
ordinates (zı , Z2 , +++ , Z-) (the complementary component 
G, having dynamic coordinates z,41 , *** ; 2s). The funda- 
mental distribution law assumed for the system G, i.e. for the 
multidimensional random quantity (zı , *+* , Za), uniquely 
determines, according to the well known rules of probability, 
the distribution law for the arbitrary group of dynamical 
variables in the system G, . 

In particular, the set of variables (z, , 72, --: , £,) (r < 2s) 
or, as we shall say for brevity, the component G, is subject to 
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a definite distribution law in the space of r dimensions, which, 
of course, coincides with its phase space. Let us now find this 
distribution law. 

Let M, be a measurable set in the phase space T, of the 
component G, , in which (z, , 22, +° , 2) serve aa Cartesian 
coordinates. Further, let M be a set of points in the phase 
space T of the system G for which the first r coordinates 
represent a point in the space T, belonging to the set M, (so 
that the point (z,, ++- , z,) will belong to the set M, , when 
and only when the point (z, , 72.) belongs to the set M). 
The probability that the representative point P, of the com- 
ponent G, falls within the set M, coincides with the probability 
that the representative point P of the component @ falls 
within the set M (both probabilities being determined, as 
usual, under the assumption that E&E = a), so that we have 
(denoting by AB the intersection i.e the common part of the 
sets A and B): 


ip 
O(a) Juz, grad E 


— dz 
~ O(a) Jz. f grad E ’ 


where stands for the previously defined function charac- 
teristic of the set M. Because of the general theorem (section 
7, Chapter II) this gives us: 


P(P, E M) P(P € M) = 


(23) P(P, EM,) = Gil, edY, 


where dV stands for the volume element in the phase space T 
of the system G i.e. 
dV = dz, —— dzz. » 


Since the function ¢ is independent of the variables z,., 
T2, we conclude that 


I. pdv = l. p aV, [V.-s) dV, 


poo 
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where dV, = dz, --- dz, , dV, = dz., ++- dt, , B, = 
E(t , --: , 2) and E, = E,(z,., , «+: , Za) represent 
the energies of the components G, and G, , whereas (V._s,)2 
is the set of points in the space T, for which EF, < a — EF, . 
The outer integration can be extended through the entire space 
T, without changing the results since for Æ, > a the inner 
integral vanishes. 

In the above expression the inner integral represents the 
volume of that part of the phase space T, of the component 
G, where E, < a — E, . Denoting it as usual by V,(a — E,) 
we will have: 


| odv = | Vla- Eav, = f Vsa- E) av, 
“Ve r, “My 
consequently: 


since it is clear from the definition of the structure function 
that Vi(z) coincides with the structure function Q(z) of the 
component G, . Thus the relation (23) gives us 


(24)  P(P, E€ M)) = T f o(a — E,) dV, . 


It must be remembered that in the above expression dV, = 
dz, +- dz, and E, is the function of z, , °° , 2. 

We see that in the cage where the energy of the system G 
is equal to a, the distribution law of the component G, in its 
phase space T, is given by the density function 


(25) Bile = 2) 


Qa) ' 


where Q(x) is the structure function of the complementary 
component G, . This fact permits us to write the expression 
for the phase average of any function depending on zı, =, 
z, , in the form of an integral extended over the space T, . 
Indeed, let (z: , --: , 2) be such a function. We know that 
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its phase average ¢ coincides with the mathematical expecta- 
tion Ey which, according to the above results, can be written 
in the form 


as 1 
(26) = Ep = a5 | oma — E) aM. 
The most important function of the above type is the energy 
E, = E,(z, , +--+ , 2) of the component G, . Because of (26) 
we have: 
F, = EZ, = xo | E% — B) dV, . 


However, because of the particular importance of the quan- 
tity E, , we will not limit ourselves by establishing only its 
mean value, but we will also find its distribution law. 

We have seen that the aggregate (zı , --- , 2) of the dy- 
namical coordinates of the component G, is a multi-dimensional 
random quantity distributed in the space T, with the density 


Q,(a — E.) 
Qla) 


Accordingly, the probability that 9, < E, < g: is given by: 


P(g, <E, < g) = (a — E) dV, . 


= 
Q(a) GLE: <9 


According to formula (18) Chapter II, this multiple integral 
in the space of r dimensions can be written in the form of a 
simple integral 


J7 aE) - E) dB, , 


which brings us to the relation 


Pl < Bi < g) = gg | SMa — 2) dz. 
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Thus the random quantity Æ, is subject to probability density 


(27) Q(z) (a — z) 


Q(a) 


This permits us to express the phase average of any function 
(E) of the energy of the component G, in the form of an 
ordinary integral: 


(8) z= Ef) = f of) aaea dz. 


In particular: 
(29) E, = ER, = w f 29,(z)9,(a — 2) dz. 


In the last two formulae the integrals can be taken between 
infinite limits; in fact, since the integrated function is different 
from sero only for 0 < z < a there is no divergence difficulty. 

In applications we will usually encounter phase functions 
which depend on the dynamical coordinates of some com- 
ponents of the given system, and include essentially the energy 
of this component. As we have just seen, the distribution law 
for the energy of the given component sa well as for its dy- 
namic variables contains the structure functions Q, Q, , and 
Qa . (It may be noted that the general formulae determining 
the mean values of an arbitrary phase function on the surface 
Z. also contain the quantity 2(a)). Thus it is clear that any 
analytical method of deriving the approximate formulae for the 
mean values of the phase function used in statistical mechanics 
must first of all give convenient approximate expressions for 
the structure functions. Accordingly, in our approach to the 
problem, we will try to usc the fact that the systems usually 
considered in statistical mechanics consist of a very large 
number of similar components. Using the methods of the theory 
of probability we will be able to establish for the structure 
functions of such systems the approximate expressions which 
are to a large extent independent of the nature of individual 
components. 
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16. Generating functions. Let us consider a system G whose 
structure function Q(z) is subject to the usual conditions: it 
is positive and monotonically increasing for z > 0, it is equal 
to zero for z < 0, it is continuous and increases without bound 
for z — œ. However, we will require the integral 


(30) bla) = T e-**0(2) dz 


to converge for any a > 0 (it may be remarked that this 
condition is satisfied in all actual physical problems). 

In the future we will call the function (a), (which is 
nothing else but the “Laplace transform” of the structure 
function Q(z)), the generating function of the system G, because 
of its fundamental role in our analytical method. For the same 
reason we will discuss in more detail the fundamental prop- 
erties of such generating functions. 

Each generating function is completely determined for all 
positive values of its argument by the expression (30); only 
this cave will be considered. From the definition of the gene- 
rating function it follows that: 

(1) (a) ts a positive and monotonically decreasing function of a. 
(2) B(a) >œ fora— 0. 


Furthermore, it is easy to prove that: 


(3) For any a > 0, (a) has derivatives of all orders. For n = 
1, 2, ... 


(31) IMa) = (—1)" f x"e-**0(z) dz. 


In fact, for any positive number a, , and for any large 
number n, we have, for sufficiently large z and a > ay: 


ze ™ < et ha ial = go net. 


From this follows that the integral in the expression (31) 
converges uniformly fora > a. 

We also notice that, since (a) is always positive, the function 
log (æ) also possesses all the properties mentioned above 
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(except, of course, being positive); in particular any generating 
function has logarithmic derivatives of all orders. 
(4) The second logarithmic derivative of the function ®(a) is 
always positive for a > 0. 

In fact direct calculation shows that: 


d? log Sla) _ a) (a) — [2 (a)]? 
da’ [2(a)] 


2 5 f (2 + Z) 000 dz > 0. 


(a) 
From this follows: 
(5) The equation 
_#@_, 
(a) 


has ene single positive solution for any a > 0. 
In fact consider the function 


@(a) = e(a). 


Because of the property (2) #.(a)— œ for a — 0, and since 
a/2 8/2 
la) > e" f elr) dz > e°*” f Q(z) dz, 
0 0 


we can conclude that ,(a) —œ also for a —>œ. It is also 
apparent that the function log #,(a) possesses the same prop- 
erties. However, log S. (c) is convex function since its second 
derivative, coinciding with the derivative of log (a), is always 
positive because of the property (4); this shows that the func- 
tion log #,(a), becoming infinite for a — 0 and a ~~, must 
necessarily possess # single minimum. In the point of minimum 


d log #.(a) — —— = 0 
da (a) 


which proves cur statement. 
The most important property of generating functions is their 
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law of composition, i.e. the law by which the generating func- 
tion is constructed from the generating functions of its com- 
ponents. Let the system G consist of two components G, and 
G with the structure functions Q,(z) and 9,(z) and the gene- 
rating function ®,(a) and #,(a). Since, according to the for- 
mula (20) section 8, Chapter II, 


Ua) = f a- WOW) dy, 


we have 


(a) = f enaa dz = fer az f Q(z — y)Q(y) dy 
= f (ye dy f Q(z — ye dz 


= f 22(y)e~°" dy f 2,(2)e** dz = $,(a)$(a). 


It is clear that, using the method of mathematical induction, 
we can generalize this result for the functions consisting of 
many components. Thus we come to the following rule: 


(6) The generating function of a system G is equal to the product 
of the generating functions of its components. 
Thus, for example, if G is a gas, consisting of n identical mole- 
cules and if (æ) is the generating function of a single molecule 
we have 

®(a) = [pla 


If G is a mixture of two gases consisting of n, molecules with 
the generating function ¢,(a) and n, molecules with the gene- 
rating function ¢.(a), we have: 


(a) = ly: (a)]" [p2(a)]" 


etc. 
Thus we see that for the composite mechanical system, 
generating functions are subject to a much simpler composition 
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law than the structure functions. This particular property of 
generating functions makes them particularly convenient for 
the study of systems consisting of a large number of com- 
ponents. 

We may also remark that, on the basis of the general formula 
(19) Chapter II, the generating function (a) of the system 
G can be expressed in the form 


&a) = Í e“ aV, 


where E is a total energy of the system G considered as a 
function of coordinates in the phase space T. 


17. Conjugate distribution laws. Consider again the sys- 
tem G discussed in a previous section and use the same notation 
as before. Assume 


1 
ane Az (209), 
(33) U'™ (2) Z Pla) | 


0 (z < o.) 


Since, 
Ua) > 0, f U (2) dr = 1, 


we conclude that U‘*’(z) represents a probability density for 
any a > 0. For different a’s we obtain an entire family of 
distribution laws. It is clear that this family is completely de- 
termined by the structure of the system G. We will call these 
the family of distribution laws conjugate with the system G. 

Conversely, the structure function Q(x) can be found from 
any of the conjugate distribution functions U‘*’(z) by means 
of the formula 


(34) Q(z) = Bae U (2). 


The mathematical expectation and the dispersion of a quantity 
distributed according to the conjugate law U‘*’(z) can be 


expressed in a simple way through the generating function 
(a) and its derivatives. In fact: 


f zU (2) dz = FE f ze Q(z) dz 
(35) 


_PVa)__d log P(a) 
da ` 


la 


Remembering the property (5) of the generating functions we 
deduce an important theorem: 


Theorem: For any positive number a, one can always find 
among the conjugate functions U'"’(z) only one function which 
has mathematical expectation a. 


Furthermore, the dispersion corresponding to U‘ (z) is 
given by the expression: 


f (z — aU (a) dz = | il PUA (2) az | -æ 


= TE p re *"Q(z) az | — Ga 


= Sla) (a) — [$ (a)]? 
[®(a)]? 


_ @* log (a) 
— 


(36) 


Finally, the composition law of the structure functions 
Ua) = f O(c — Du) dy 


together with the expression (34), and the composition law of 
the leading functions (where U{® (z) and U$® (z) represent 
the conjugate distribution functions of the components G, and 
Ga) give ua: 
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&(a)e“*US*(z) 


f laje- Ui (z — y)dae Ui (y) dy 


&,(a),(a)e** f US (z — y)U! (y) dy 


Baje" f UE — NUS) dy. 
from which follows: 
(37) ua) = f UPE — YU) dy. 


Using the method of mathematical induction, we can gene- 
ralize this important formula for the casa where the system 
G consists of any number n of components with conjugate dis- 
tributions U{® (xz), Ui" (z), --- , US*’(z). We have 


(38) U (z) = f agi a(z) — - > a). 


kal 


This is the composition law, well known in the theory of 
probability; it allows one to express the distribution of the 
uum of n independent random quantities in terms of the 
probability densities of individual components. We are led to 
the following rule concerning the composition of the conjugate 
distributions: The conjugate distribution law of a given system 
can be derived from the corresponding distributions of its n com- 
ponents in the same way as the distribution of a sum of n mutually 
independent random quantities is found from the distributions of 
individual terms. 

It is clear that the value of the parameter a is quite arbitrary, 
but must be the same for all systems in question. 


18. Systems consisting of a large number of components. 
Consider a system G consisting of the components G, ,G,,-°--, 
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Ga , where n is a very large number. According to the formula 
(34) of the previous section: 


Oz) = laje U (2). 


We will try now to obtain a convenient approximate expression 
for the function Q(z). The above expression for this function 
contains, apart from the elementary function e**, the gene- 
rating function (a) and the conjugate distribution function 
U‘* (x). As can be easily seen and as we shall soon prove, the 
presence of the function ®(a) does not lead to any difficulties 
because of the extremely simple composition laws (section 16) 
governing the generating functions; in fact we have already 
seen that ®(a), being independent of z, plays the role of only 
a constant factor in the expression for the function Q(z). Thus, 
the principal difficulty in our problem consists in finding a 
convenient approximate expression for the conjugate distri- 
bution function U‘*’(z). 

We are helped here by the analytical methods of the theory 
of probability. In the previous section we have seen that U‘*’ (x) 
represents the probability density for the sum of n random 
quantities (n being in the present case a very large number). 
For such cases the limit theorems of the theory of probability 
supply us with simple, convenient, and rather exact analytical 
approximations, the form of which does not depend on the 
special nature of the laws governing the separate components. 
These laws have only a small number of parameters entering 
into the approximate expressions. Thus, not having detailed 
information concerning the structure of the separate com- 
ponents of the system G and basing our conclusions exclusively 
on the very large numbers of these components we can arrive 
at important conclusions concerning the properties of this 
system. This result is typical of any application of the theory 
of probability and demonstrates its principal advantage in the 
study of mass phenomena. 

Let us remark that the value of the parameter a still re- 
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mains entirely arbitrary, so that we can use this extra degree 
of freedom for the simplification of the future calculations. 
Thus, rather than creating the special analytical formalism 
for the purposes of statistical mechanics we plan to use in all 
future calculations the conventional formalism of the theory 
of probability. The next chapter will be devoted to the dis- 
cussion of the fundamental steps to be taken along this line. 


CHAPTER V 


APPLICATION OF THE 
CENTRAL LIMIT THEOREM 


19. Approximate expressions of structure functions. The 
most convenient formulation of the so called “central limit 
theorem” of the theory of probability, which gives the ap- 
proximate expression for the distribution law governing the 
sum of a large number of mutually independent random 
quantities, can be given in the following form: 

Consider a sequence of mutually independent random quantities 
with probability densities u,(x), u(x), --- , and characteristic 
functions g,(t), ga(t), --- , so that 


ni) = f ula) de (k= 1,2, ---). 
Let 


f ae dz = a, 
f (z — a)*u(z) dt = b,, 


f | z — a |'u(z) dz 


ll 
Q 
v 
— 
m 
a 
il 
T 
N 
w 


f (z — a,)‘u,(z) dr =d,, 


f | z — a, |"u(x) dz = e 


and assume that the given distribution laws are subject to the 
following conditions: 
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(1) The functions u,(x) are differentiable and there exists a 
constant L such that 


[iw@lar<b &=1,2,--). 


(2) There exist positive constants a and B (a < B) such that: 
a<h<8,a<£8,d <P, e <8 (k = 1, 2, ---). 


(3) There exist positive constants \ and r such that in the region 
lil <r: 
Lg.(t)| >A (k = 1,2, --:). 


(4) For each interval (c, , c3) (cı ca > 0) there exists a number 
p = p(c,, C2) < 1 such that for any 14 in the interval (cı , C2): 


lal <P (k= 1,2,--:). 


Let us put A, = Dot. a,, B, = Dot. b, and write U,(z) 
aa the probability density of the sum of the first n random quan- 
tities. Then, for n >@: 


U,(z) = aay exp | - z z ‘| 
(39) — Ltlz— A. | for | z — A, | < 2 log’n 
n 


o(2) eiiis: 
n 


The proof of this theorem is given in the appendix, together 
with a more exact formulation which, however, is not necessary 
for the purposes of the present chapter. 

As indicated at the end of the previous chapter, we must 
use the central limit theorem for estimating the conjugate 
distribution function U‘*’(z) of the given system G under the 
assumption that the system consists of a very large number of 
components gı , g2 , *** » gn , With the structure functions 
w (2), w2(z), Sy w,(Z), the generating functions pı (a), pla), 


--+ , g(a) and conjugate functions uj*’(z), us" (z), =- , 
u‘™ (x). Since the latter functions play the role here of the 
functions u,(z) in the formulation of the limit theorem, we 
must initially make sure that the conjugate functions for the 
actual physical systems satisfy the conditions assumed in the 
proof of the limit theorem. This, however, does not present 
any difficulty. 

The point is, that the conditions imposed on the functions 
u(x) in the limit theorem are equivalent to assuming the 
uniformity of one or another property which they describe. 
However, in statistical physics the separate components g: 
(molecules, atoms etc.) are always either of the same kind 
(homogeneous substance) or of a small number of different 
kinds (a mixture of several homogeneous substances). Thus, the 
structure functions and consequently also the conjugate func- 
tions for these components form a set within which all elements 
are either identical or break up into a small number of groups 
of identical elements. It is clear that under such conditions each 
characteristic of the functions ui*’(z) appears uniformly in the 
entire set. 

Let us consider now the separate conditions prescribed by the 
limit theorem. The structure function «,(z), as well as its 
derivative, is usually an analytic function which does not in- 
crease faster than a certain power of z when x — œ ; since: 


us" (z) = at aw, (2), 


the condition (1) is always satisfied. 

The functions uj’(z) obviously always possess finite mo- 
ments of all orders, whereas the uniformity restrictions an these 
moments follow directly from the above general remarks; thus 
the condition (2) is also always satisfied. The situation with the 
conditions (3) and (4) pertaining to the characteristic functions 
is even simpler. In fact, the condition (3) demands that g,(?) 
does not become szaro for sufficiently small t; this is a property 
common to any characteristic function. The condition (4) 
demands only that g.(#) = 1 for t = 0; this also is m property 
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common to the characteristic functions of any continuous dis- 
tribution law. 

Thus we see that the use of the central limit theorem for 
the estimates of the conjugate distributions of a mechanical 
system gives definite answers. Introducing into the formula (34) 
(section 17 of the previous chapter) the approximate expression 
of U‘(z) aa given bythe expression (39) of the present 
section, and taking into account the formulae (35) and (36) 
(section 17, chapter IV), we must let 


, d 8 Bla) B, = a’ og Pia) 


and thus obtain: 


Q(z) 
d log (a) VF 
—* &( ) az 1 + da ) 
— d’ log 172 XP | — Flog Bla) 
{an 2 2a) 2 ERN 
da da 
(40) 


| Ta n | 
o(! + S A for 








o(2) for alte 
n 


We shall use this formula as a starting point in all future 
calculations. 

Let us now consider the choice of the parameter æ which so 
far has been arbitrary. In all cases when we speak about sume 
system G with the constant energy a and about the different 
components of this system we will choose for a the simple root 
(comp. 5, section 16, chapter IV) of the equation: 


_d log (a) _ 
de = 


This value of the parameter æ we will denote in the future 
by ð. We will also assume: 
2 
( log 2) =B. 
da an 
Using these notations for the structure function Q(z) of the 


basic system G, we obtain from the formula (40) for a = # 
the following expression: 


on | 252") 


Q(z) = He” — 


(41) (zma) for |z — a| < 2 log’n 


3 
ni? 


o(4) for all z. 


In particular, for z = a this gives an important formula: 
= vo ——— -3/2 | 
(42) Qa) * v*)e |a + O(n ) i 


This formula gives the approximate expression corresponding 
to the surface of constant energy 2, in terms of the generating 
function (a) and its second logarithmic derivative for a = ð 
where ¥ is determined by the basic relation 


|- d log < zaj — 


20. A small component and its energy. Boltzmann’s law. 
We have soen in section 15, Chapter IV that for the system 
G with the constant energy a which tun be split into two 
components G, and G: , the distribution law of the separate 
component G, in its phase space has the probability density 


Qla — E, 
(43) Rie ne, 


where 2,(z) is the structure function of the component G, and 
E, is that function of the dynamic coordinates of the com- 
ponent G, which expresses its energy at the appropriate point 
of its phase space. Let ua now assume, as is usually the cuse in 
the statistical mechanics, that the system G is composed of a 
very large number a of separate components which we will 
call, for brevity, molecules. We will assume that these molecules 
are not very different in respect to their structure, so that we 
will be able to use the approximate formulae derived in the 
previous section; as already indicated, the necessary conditions 
are always satisfied in actual physical problems. Let the sys- 
tems G, and G, consist of n, and n, (ni + n, = n) molecules. 
Let us also assume that the molecules forming the component 
G, possess the structure functions: w,(z), --: , Wm (z), the 
generating functions ¢,(a), --- , (æ), and the conjugate 
functions u” (z), <+- , ut?’ (x), where # is determined aa the 
simple root of the equation: 


_ d log (a) _ 
on = 


Writing a, and b, for the mathematical expectation and the 


dispersion of the conjugate distribution ui” (z), we have be- 
cause of the formulae (35) and (36) (section 17, Chapter IV) 


— (4 log af) l ce (£ log pa) l 


da da 


which we will write in the shorter form: 


d log Pk — da’ log Pk 
a= — k = d ` 


Since, because of the assumed enumeration of the molecules 


#9) = [I esl), a0) = TT e100), #40) = Ù, al), 
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we have 
-lgt 
= do Do a 
J A = -HES — 
do Sel dd kona +1 
A, + A; = 4, 


and also 


? logg, _ S d? log * 
B= SHES h, By = = — 


B, + B, = B. 


From this relation it follows that the quantities a and B 
must be considered as infinitely large quantities of order n. 

Let us assume now that the component G, represents a 
negligible small part of the entire system G, i.e. that n, is 
negligibly small in comparison with n (in particular, we can 
assume n, = 1, taking one single molecule as the component 
G,). In our asymptotic formulae this condition is expressed by: 


n, = O(n) (n =œ). 


Since we have agreed to consider all quantities a, as well as 
all quantities b, as being of the same order of magnitude we 
can conclude from the above given group of formulae that: 


= 0), B, =0(B) (n-2), 
and consequently 
A, ~A, B,~B (n >o). 


Keeping this in mind let us apply the approximate formulae 
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of the previous section to the expression (43). Because of the 
formula (42) section 19: 


(44) ofa) = POE, (1 + 00). 


To get the expression for Q(a — E,) we will use the formula 
(41) of the section 19. Obviously we must write (a — E,) 
instead of z, ®,(3) instead of (8), Ba instead of B, and A, 
instead of a. Since A, = a — A, , the difference x — a will be 
substituted by A, — E, . In the remaining terms we can keep 
n (rather than substituting n, for it) because of the relation 
Nna ~ n. Thus we find 


(E, — A,)* 
0,(a — E) = snr el ae | + (o) 


(2B) 1/2 


where A, = O(a) = O(n). If we will consider only such values 
E, for which E, — A, = O(n”) the bracket on the right side 
of the above formula becomes: 


apy l + 00), 


and we will have: 


0 (a-F1) 


Comparing this with the formula (44), and noticing that 
B, ~ B we obtain: 


Qla — EB, 1/2 
as) WEEL EED) E- A = 08"), 
Thus we obtain for the distribution law of a small component, 
in its corresponding phase space, m very simple asymptotic 
formula (Boltzmann’s law). The most important feature of this 
law is its exponential dependence an the energy of the small 
component in question, and the important role of the parameter 


ð suggests that this parameter must have a simple physical 
interpretation. 

Considering the energy E, of our small component, we oan 
write for its probability density (according to the section 15, 
Chapter IV) 


Q(z) (a — z) 


Qla) 


According to formula (45) we can put | z — A, | = 0(n'””) and 
write the above expression in the form: 


—W 
(46) Re {1 + 00). 


It must be noted that we have obtained for the approximate 
expression of the energy distribution of a small component the 
exact conjugate function of this component 


d) oe Q,(z)e7** 
Ui (z) = a0) ` 


It is, of course, important that the parameter a assumes the 
value 9, i.e. satisfies the equation: 


HE? + a = 0. 


Thus, we sea that the conjugate distribution law of a small 
component (in particular of a single molecule), taken for a = ð 
permits a simple physical interpretation of the energy distri- 
bution of this component. 

When G, is a single molecule A, = a, remains constant for 
increasing n, and the formula (46) applies uniformly when z 
varies within arbitrary constant limits. 

The probability that the molecule will have an energy be- 
tween g, and g, is given (for the i-th molecule) by the formula: 


JE DET att + 0). 


yl) 


Hence the mathematical expectation of the number of mole- 
cules with the energy between g, and gz is given by: 


k Ley E tO 


and in the case when all molecules have identical structure 
(structure-function w(z)) we write: 


n [eee ds + o). 


pW) 


21. Mean values of the sum functions. In the present 
section we will consider the small component G, as being a 
separate molecule; since the enumeration of molecules is im- 
material we will write w,(z) and ¢,(a) for the corresponding 
structure and generating functions of the selected molecule g, . 

Each phase function f(z, , za , -+ +), depending only an dy- 
namical coordinates of the molecule g, , can be interpreted as 
a function f(P) of the point P in the phase space y, of this 
molecule. Since the set of dynamic coordinates of the molecule 
gı has the probability density 


a” (a — e) 
Q(a) 


where Q° (z) is the structure function of the complementary 
system G — g, , and e, is the energy of the molecule g, , the 
mean value of the function f is given by: 


j= f fp) e—a ay, 


(a) 


where dv, stands for the volume element in the phase space y, 
of the molecule g, , and it is assumed that the above integral 
converges absolutely. 

Using the asymptotic formulae derived above, we nan obtain 
the approximate expression of this integral, and estimate the 
corresponding error. To do this we break up the space 7, into 
two parts: y! being the set of points in the space y, for which 
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e, < log? (n), and y?’ being the set of all other points. We 


put: 
2 (a — e, ei 
J 1P LEa =, 


D 
f(P) Q (a-e) dv, = I”, 
y’'a Q(a) 
so that 
f= 4T. 
In order to guarantee the convergence of our integral, we 
will assume that, for large energy values e, , the absolute value 


of f(P) increases not faster than a certain power of this energy; 
i.e. that: 


SP) = 01) (a). 
Next, let us evaluate the integral I”. Putting, as usual, 


dlge a, Zloage y, 
dd s 


and writing #° (æ) for the generating function of the system 
G — gı , we can write (according to the formula (41) section 19): 


cos 
Qa — e) = Peo] aB 2 — 4 F 


(1) 0 (a—e:) 
from which follows, for sufficiently — n, — 


(a—e1) 


2 (a — e) < 280) $ an 


On the other hand the formula (42) section 19 gives us, for 
sufficiently large n, 


e?’ 
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which leads to: 
2’ (a — e) 4e?“ 


< 
(a) gid) 
and consequently 


F 2a — e,) ere? ** 
iris] mite ea<ef Sean, 


where C is a positive constant. Because of the general formula 
(18) section 8, Chapter II 





f ee?“ dv, = f z'w,(z)e~°* dz, 
y'a 


we obtain 


| I” | < ~= a wi(x)e~°* dz 


Pı o log'n 


C — 2 i k —0z/2 
< 5 exp { 5 log n) Í zw,(z)e dz. 


Since finally, the last integral in the above expression tends to 
zero for n — ©, we obtain, for sufficiently large n: 


(47) I| < exp {— Ž log n} <2. 
Let us now evaluate the integral J’. Because of the formula 
(41) section 19, we have, for e, < a, + log’ (n): 
2(a — e) 
| - (e, Fax a)? 


exp = — 
— ond PFE =a + (Lely) 


= eooni + o(} +(e —- a} 1 


n (2eB)'7? 
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whereas the formula (42) section 19 


Ua) = B(9)e"* {1 + o(2)}. 





It follows 
2a — e) 2 e -0o fı 4 o( + (e, a ay") 
Q(a) eid) n 


and consequently 


= f, 1e Ey {r+ often} a, 


e?” ( 1 
= P) —~ dv 0 1) 
J He) Sy des + Of; 
The integration over the space yí cam be extended over y, 
without difficulty since sa we have seen above 


— * 
f HP) Sj a = O15) 
Thus taking into account — we obtain: 
(48) j= f 1e) Eg ant ofi). 
In particular, when f(P) = x(e;) is a function of the energy of 


the selected molecule we obtain, because of the formula (19) 
section. 8, Chapter II, 


z= f d E la) de + 0(2) 


— f x(au!” (2) dz + o(2). 
Thus, in particular, 


(0) a = f zu”) ds + o(4) —— o(ż), 


(49) 


and 


(e — a) = f (z — a )ul” (z) dz + 0) = b + o(4), 


etc. This formula emphasizes the role of the conjugate function 
u’ (z) as the approximate distribution law for the energy of 
a molecule. 


Most of the phase functions which we encounter in statistical 
mechanics have a very special form. They are almost always 
the sums of functions each depending only on the dynamic co- 
ordinates of only one molecule. Such phase functions we will 
call sum functions. Thus if a system G is formed by the mole- 
cules gı , 92, °** , ga corresponding to the phase spaces yı , 
Y2;°**» Ya , the sum function can be written as: 


HP) = È SAP), 


where P, is some point in the space yli = 1, 2, --- , n). Since 
the mathematical expectation of a sum is always equal to the 
sum of the mathematical expectations of its terms, we obtain 
(using formula (48)) the following approximate expression for 
the phase average for such a sum function: 


f= DRA Lf serie atom 


(it goes without saying that the functions f,(P,;) must satisfy 
the general assumptions used in the derivation of the formula 
(48)). 

Example 1. The number of molecules with energy within certain 
limits. Let 0 <a < B < +©@ and: 


1, if a <e < B, 
f(Pd * 


O in all other cases. 


68 
The sum function 
IP) = È fP) 


represents, apparently, the number n$ of the molecules of the 
given system with the energy between a and $. According to 
(49): 


n =f= > aad w; Popas a — + 0(1) 


> Èj uf (2) dz + O(1). 


In particular when all molecules are identical we have: 


nt = wu? (z) dz + F 


Example 2. The energy of a large component. Let G, be the 
component of the system G consisting of the molecules g, . 
92, *** » Qn, : Let E, be the energy of this component and 


€: (1 < i < nı) 
SAP A) = 
0 (n <i<n). 
It is apparent that 
B, = 2 f (P ) 
so that formula (50) gives us: 


6) B= Sa, tom - (H820) 4o. 


Let us note here that we cannot use the same method for the 
evaluation of the dispersion of £, , since the energies of different 
molecules forming the component G, sre not independent so 
that the dispersion of their sum is different from the sum of 


their dispersions. This considerably mora complicated question 
will be discussed later (Chapter VIII). 


22. Energy distribution law of a large component. The 
derivation of the asymptotic distribution law for the sum func- 
tion considered as a random quantity is a rather difficult 
problem and will be considered in detail in one of the following 
chapters (Chapter VIII). However, in the most important cage 
of the function E, considered in the previous section, the 
problem can be solved comparatively simply. 

Let G} be the component complementary to G, , and let 
E, be its energy. We also put n, = n — n, and consider n, 
and n, as being infinitely large quantities of the order n. In 
general we will use the indices 1 and 2 to denote the quantities 
pertaining to G, and G, ; in particular we put: 


A= ia, Az=a-A,= È t, 


B,= Db, B, =B — B, 
t=) feniti 
According to the formula (27) section 15, Chapter IV the 
probability density of E, is given by 


I 
= 


Q(z) (a — 2) 
Q(a) j 


According to (41) and (42) of section 19: 


Q(z) = ,(d)e 

_ @— Ay) 7 
Qha — 2) = — a : ] + ay 
(a) = aoe {abn + o(2)}. 
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Writing, for brevity, (B,B.)/B = B* and remembering that, 
because of the law of composition for the generating functions 
(d) = 4,(9)%,(8), we get: 


Q(z) (a — 2) 


Q(a) 


(52) = — exp \- (z — Ay _ “3B, = ) F 0(;) 


= apy on {- BAe — =A), 9/2), 


Thus, for n — œ , the distribution of E, is given by the Gauss 
distribution function with the maximum at A, and the dis- 
persion 

B,B, 
Ss 
B B` 

If the energies of the molecules composing the component 
G, were mutually independent the dispersion of their sum 
would be 

> b; = B, e 
121 

Thus we see that the true dispersion B* < B, , which is 
quite understandable. In fact, since the sum of energies of all 
n molecules is fixed, the energies of individual molecules sre 
correlated negatively, so that the dispersion of their sum is 
smaller than the sum of their dispersions. 


23. Example of monatomic ideal gas. As an example of 
the application of the above discussed general methods we will 
consider now the simplest statistical system; a monatomic 
ideal gas. This served ss the first example in the development 
of the basic ideas of statistical mechanics. Under the nama 
“ideal monatomic gas” we will imagine a system G whose 
molecules g, , ga , '** , ga are Simply material points. As usual, 
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the total energy of the system is the sum of the energies of the 
individual molecules so that the molecules must not possess 
any mutual potential energy; ae we have seen in section 8, 
Chapter II, this condition, which is unavoidable for the ap- 
plicability of our methods, is actually never fulfilled in reality 
eo that we have to consider it as only an approximation. We 
will assume that our gus (system G) is contained in a vessel 
with the finite volume V; this condition will be expressed 
formally by the introduction of a special term U,(z; , ys , Zi) 
representing the potential of the walls into the expression for 
the energy e; of the molecule g; (with the coordinates z£; , Y; , 2;). 
Since we assume that the system G is not subject to any out- 
side forces we can write 


6e; = *. (ai + yi +z) + V. œ., Y: 2i), 


where m; is a mass of the molecule g, , whereas z; , Y: , Z; are the 
components of its velocity. We will assume that the forces 
between the walls of the vessel and the molecules are different 
from sero only at very small distances from the wall. If we 
require that not a single molecule, no matter how fast it is 
moving, can penetrate through the walls of the vessel, we 
must assume that U, is infinitely large outside the vessel. 
Inside the vessel we can assume U; to be an arbitrary constant 
putting for simplicity U, = 0. Of course, such a description 
of the function U, (U; = 0 inside, U; = + © outside) is only 
an approximate one; it would be more correct to sesume that 
the function U; is continuous and increases very rapidly when 
the molecule approaches the wall. However, we will use this 
idealized concept of U; since it considerably simplifies the 
calculations without essentially affecting the results. 

The Hamiltonian dynamic coordinates of the molecule g, 
are represented by its three Cartesian coordinates and three 
components of its momentum, 


ðe e e . 
p: = — = Mi, QEM, m= me. 
($ 
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Therefore, 
6; = 5, wi + gi tri) + U(x; , Yi » Zo), 


and consequently 


= Da Le ig ++) + È Ul, v, z:) 


where E is the total energy of the system G. 
For the function V(x) which expresses the part of the space 
T where E < z we have the expression: 


V(2) = i dV = J, TT dz: dys dz; dp: dg: dra - 
<z t= 

Since outside of the vessel the potential energy, and conse- 

quently the total energy E, of the system G is infinitely large, 

the integration is carried out only inside the vessel. Since, on 

the other hand, the potential energy inside is equal to zero, 

we have: 


V(z) = V" = Il ap. dq; dr, . 
Da (i t tr) a 


iml 


The above integral represents the volume of an ellipsoid in the 
n-dimensional space with the semi-axes (2m,z)'” (¢ = 1, 
2,---, n). This gives us 


ae Go... 3/2\ sare 
VD = V enD + zi [I m bs 


Thus the structure function of the system G can be written as 


n (2r)? 3/2 3n/2)—1 
=v geen appar. 


In this elementary example the expression for the structure 
function is so simple that one does not require an approximate 


(53) 
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formula. However, for purposes of illustration we will con- 
struct the asymptotic formula for Q(z) and will compare it 
with the exact expression (53). For the generating function 
(3) of the system G we have: 


D) = | Ue” dz 
= (2m)? i Ta * èn > (3n/2)-1 -òs 
= V n/d + 1] [I mi’ al x ou e dz 
(54) 


men” Il mi} 3? 


If z is the total energy of the system G, the quantity 3 is de- 
termined as the root of the equation: 


_ dlog% _ 3n — 
dð ~ 2% ; 
thus, 
3n 
(55) ù= 2 


and consequently: 
n 3n —3n/2 
(0) — ven I] mi}(32) g”. 
i=] 
From the formula (42) section 19, where 


g = L0 $ 3n _ 2r 


dò 237 3n’ 
we obtain the asymptotic expression: 
J 
[2r(d" log */doꝰ)] 
ji Qn)? 3n n=) 
mare Hee 
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Comparing the approximate expression (56) with the exact ex- 
pression (53) we see that in our single case the method leads 
to the substitution of the quantity T{(3n/2) + 1] in the formula 
(53) by its asymptotic expression given by Sterling formula. 

We will be satisfied for the time being with considering only 
this simplest system as an illustration of our general method; 
the complete theory of the monatomic ideal gas will be dis- 
cussed in the next chapter. 


24. The theorem of equipartition of energy. We have seen 
(section 20) that the conjugate distribution function 


ut” (z) Z w,(x)e~** 


¢.(8) 


for a small component (in particular for m molecule) of a given 
system G represents an approximate expression of the energy 
distribution of this component, In the caze of the monatomic 
ideal gas wa have 


y (2r m,) 3/2 


1/2 _ 8/2,1/2 
T(3/2) T” = 2xV(2m;)""2*, 


w,(z) = 


g(8) = (rm)? Vo? 


(this formula could be derived in a similar way aa the formulae 
(53) and (54) of the section 23 ar can be obtained directly az 
the special case of these formulae for n = 1). Thus: 

us? (2) = -ia ae, 

In particular, for the mean value of the energy e; of the mole- 
cule g; wa have an approximate expression: 


2 (8) _ _dloggy: _ 3 
(57) = f zus (dem see: LŽ, 


In the case of an ideal monatomic gas we have considered all 
molecules as possessing identical structure (i.e. the identical 
expression for energy in terms of the dynamic coordinates), 
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although their masses could be different. We see now that the 
mean value of energy as well am its distribution law is the same 
for all molecules independent of their masses. Thus, in the 
mixture of two gases—a heavy one and a light one—the mran 
energy of a molecule ia the same for both components. Further- 
more this mean energy value and its distribution is independent 
of the volume of the vessel, being a universal function of the 
parameter 3. This result, pertaining to the moan energy of a 
molecule, represents a special case of a general theorem of 
statistical mechanics known as “the law of equipartition of 
energy among the degrees of freedom’’. The importance of this 
theorem lies in the fact that in many cases it permits us to 
find the meen energy of one or the other component of the 
system without almost any calculation. We will now give the 
general proof of this theorem, omitting however some possible 
generalizations. 

Consider a system G whose component G; possesses ł de- 
grees of freedom and is described by the Hamiltonian variables 
Gi, °*'5Qt)Pr,°** » Pe - Let us assume that the total energy 
E, of the system G; is its kinetic energy. This can be written 
generally sa a quadratic form in Pı , Pa , *** , Ds , With co- 
efficients which may depend on q, , g2, °°: » qi - Let us denote 
it by H(q; , Pa). Writing, as usual, V,(x) for the volume of that 
part of the phase space occupied by the component G; where 
E; < z, we have 


Vz) = f dq ::: dg, dp, ++: dp, 


H(as,pa)<z 


where the inner integral for the fixed values of qı , q2, °°: , 

q: , is evaluated within an ellipsoid in the t-dimensional space 

representing its volume. It is clear that this volume is pro- 

portional to z‘” and that the coefficient depends on q, , 9 , 
-, qı » Thus: 


Vda) = 2” f Wis os u) dq ++: dg = ez", 
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where c, (as well as cz and ca which will be introduced later) is 
a positive constant. We get from this 


Q(z) = ans = et" (x > 0), 


&,(9) = o f qtte? dz — ed, 
0 


and consequently 


m ~ 4log: _ t 
(58) E; do 23° 


This relation expresses the theorem which we intended to prove, 
in fact it shows that the mean energy of a component of the 
given system is proportional to the corresponding number of 
degrees of freedom, the coefficient of proportionality being the 
quantity 1/23. In particular, since for the molecule of mon- 
atomic ideal gas t = 3, the formula (57) derived earlier repre- 
sents a particular case of the general formula (58). 

It is interesting to notice that the theorem of equipartition 
of energy which was proved using the approximate expression 
for the mean energy actually holds for the exact expression (of 
course in this case we get a somewhat different coefficient of 
proportionality; the quantity # arose from our approximate 
analysis and does not exist at all in the exact theory). In order 
to prove this we must notice that the Hamiltonian function 
H(q; , Px) of the selected component, being a quadratic form 
in the variables p, , must satisfy the Euler relation 


1< oH 
H = 2 2 Pa ap,’ 
and therefore 


z= Hdz __1 >f ðH dz 
i= Wa) Jz. grad E ~ 20(a) & Jy, ” ap, grad E 
(59) 
i dE dz 
= a) 2 Í. Pt dp, grad BE’ 
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since in the expression of the total energy E the variables p, 
only enter in H. Thus: 


ðE ðH 

ap, op GQskso. 
For each of the surface integrals on the right hand side of 
(59) a volume integral can be substituted according to the 
Green’s theorem. Since (1/grad E) 3E/ðp, is the cosine of the 
angle between the outward normal to the surface and the 
P.-axis, we have 


poe dz OF 
Le F on grad E` Va SPs 


where F(q: , Pa) is an arbitrary function of the dynamic 
variables of the system G. In pi 


ðE dz 
l * ap, grad E~ ve 9 ae ay = Vla) 

i.e. each such integral is equal to * volume of that part of 

the phase space of the system G where E < a. Therefore the 

formula (59) gives us: 


(60) E; si 2) 


proving the above statement. 

Comparing the exact formula (60) with the formula (58) 
which (according to (50) section 21) is exact up to terms of 
order 1/n we obtain 





— dV; 





(61) - $9 + of) = dY + 92), 


this gives us the approximate expression of the parameter ® in 
terms of V(a) i.e. the approximate solution of the equation 
d log ® 
— = a, 
which determines the quantity 3. The formula (61) plays an 
important role in some theoretical studies. 
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Example. Rotational energy of a molecule in diatomic ideal gas. 
We will imagine n molecule of a diatomic gas as a pair of 
material points connected by a rigid massless axis of infinitely 
small length. The position of such a system in space is de- 
termined by five parameters for which we will choose three 
Cartesian coordinates z, y, s of one of the points and two 
angular coordinates ¢ and y determining the direction of the 
axis. Writing P: , Py , De , Py , Py for the corresponding mo- 
menta, m for the maxs of the molecule, and A for the moment 
of inertia of the second point with respect to a center of rota- 
tion at the first we can express the total kinetic energy of the 
molecule az the sum of the translational energy: 


=~ + py + Ps) 
and the rotational energy: 
za (02+ ahi) 
(62) Cr s(x 5 sin’ y 


(we shall see however that a knowledge of this expression is 
unimportant for the solution of our problem). 

The molecule in question represents a component of the gas 
(whose other components may have, however, an entirely 
different structure). On the basis of a general definition of the 
component section 8, Chapter II we can consider each of the 
two sets of dynamical variables (z, y, z, P., Py , Ds) and 
(vy, V, De » Py) aX un individual component of our gas; these 
two components can be considered as the fictitious “carriers” 
of the energies e, and e, corresponding to three and two degrees 
of freedom. The determination of the mean values of any of 
these components can be done without calculation by using the 
theorem of equipartition of energy. This theorem leads im- 
mediately to our previous formula for the translational energy, 
whereas for the rotational energy it gives 


roe Ld 1g Via} zi d tog Yo) an 1 


iz 
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Thus we see that the theorem of equipartition of energy 
among the degrees of freedom saves us many difficult calcula- 
tions which we would have to carry out in the case of a more 
complicated statistical system. 

For the sake of completeness let us find the approximate 
expression for the distribution of e, . Since: 


va) = Í dp dy dp, dp, 


-pok a, 


z dp, dp, = 8r’ Ax 
— 
Pe + gin” o < 2Az 


(here the inner integral represents the srea of an ellipse with 
the semi-axes (2Az)’” and (2Az)’” sin y) we find that the 
structure function of the “fictitious carrier” of the energy e, is 


w(t) = vf(z) = wA 
so that the generating function is 


Sx" A 
; 





g.(8) = 8x7A | Pa de = 


This gives us, by the way, the familiar result: 
~- _ dlge) _ 1 
= 5: 


e, = do 


The conjugate function u‘°’(z) is determined by the ex- 
pression 
-93 
ut” (z) = w, (xe = Je`’ 


¢-(9) 


which is the approximate expression for the probability density 
of the quantity e, . From the above it follows that 


P(g, < e, < gs) = a f e” dr =e?" = e", 
Pi: 
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Thus, the rotational energy of a diatomic molecule is subject 
to a exponential distribution depending exclusively on the 
parameter v. 


25. A system in thermal equilibrium. Canonical distribu- 
tion of Gibbs. In the previous discussions we have been con- 
sidering G as an isolated system which is not subject to energy 
exchange with the surrounding medium; thus the total energy 
E waa always considered as a constant. It is clear that in actual 
physical systems such an assumption can be only approxi- 
mately true since any actual physical system, no matter how 
well isolated, nevertheless undergoes some kind of energy in- 
teraction with its surroundings. 

Another possible idealization consists in considering the sys- 
tem G as being a component of another much larger system 
G*. In this case the component can freely exchange energy 
with its surroundings, i.e., with the other parts of the system 
G*, and the energy E of the system G must be considered sa a 
random quantity varying with time. Its distribution law can 
be derived on the basis of the general formula which we have 
proved earlier. It is clear that the question as to which of the 
two idealizations is closer to reality must be decided on the 
basis of purely physical considerations in each particular case. 

We have geen above, Chapter IV, section 13, that in the 
first case (completely isolated system) the fundamental dis- 
tribution law for the system G can be obtained as follows: the 
point P in the phase space T of the system G, representing 
the state of this system is always located on the surface 2, 
and is distributed with the surface density: 


1 
Q(a) grad E 


In the second case (freely interacting system) we arrive at 
an entirely different fundamental law. Since G is now a small 
component of the system G*, the point P is no longer bound 
to any surface of constant energy, but can move freely in the 
space T; according to the previously derived distribution law 


(63) 
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for the small component (section 20), the point P is distributed 
in the space T according to the probability density whose 
approximate form is 


1 -99B 
* 3(0%) ° 
(the quantity marked by asterisks pertains to the system 
G*, so that the quantity 3* is a root of the equation 
— [d log ®*(a)}/da = E*). 

This latter idealized picture is usually referred to as a system 
in thermal equilibrium. A uniform constant temperature in 
this case is due to the free interaction between the system G 
and its surroundings. 

According to Gibbs the fundamental distribution law (63) 
corresponding to the first idealized picture is called a micro- 
canonical distribution, whereas the law (64) corresponding to 
the second idealized picture is known sa canonical distribution. 

The fundamental difference between these two distribution 
laws lies in the fact that whereas (63) gives the distribution 
on the surface 2, , (64) establishes the distribution in the 
entire phase space T. 

Let us now consider the canonical distribution (64) in greater 
detail. As we know (section 20), the distribution of energy F 
in the system G (considered as a small component of the 
system G*) is given by the probability density 


Q(z)?" 
(d *) 
go that: 


—0*¢s 
a(o”) da Janse 
This shows that the parameter #* plays the aame role in the 


second picture ms the parameter # did in the first one, provided 
that instead of the fixed value E = a we introduce in the 
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canonical distribution the quantity E representing the mathe- 
matical expectation of E. 

Assume now that the system G consists of two components 
G, and G, with the energies E, and E, . In order to obtain the 
distribution law of the system G, in its phase space T, , we 
must integrate the expression (64) over the entire phase space 
T: . Since E = E, + E, and &(8*) = &,(0*),(0*) and also 
(according to (32) section 16): 


f e°" dV, = &,(8*) 


this integration yields 


etek 7 — — geek 
A &(3*) cas &(3*) Je —— &(3*) ae 7?” 








(65) 


a e`’ Ee 
#,(6*) 


which represents the probability density governing the distri- 
tion of the component G, in its phase space. Thus, we see that 
any component of the canonically distributed system is also 
distributed canonically with the same value of the parameter #*. 

We know that the formula (65) holds also for small com- 
ponents in the microcanonical distribution as represented by 
the fundamental law (63). On the other hand, for the large 
components the formula (65) does not hold since it leads to the 
energy distribution law 


QE eE: 
$,(8*) 


which differs quite essentially from the formula (52) section 
22 for the distribution of the large component of the micro- 
canonical system. 

The relation: 





DOr) BO) S0*) 
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shows us that in the case of the canonical distribution for a 
system consisting of several components the distribution laws 
for these components must be combined as if they were mutu- 
ally independent groups of random quantities. On the other 
hand, in the case of a microcanonical distribution the various 
components of a given system «te considered ag mutually 
dependent due to the constancy of the total energy. That is 
why, in spite of the fact that for small components the distri- 
bution laws based on (63) and (64) are almost identical, such 
is not the cage for the large components. 

It is clear that the computations based on the law (64) aro 
considerably simpler than those based on the law (63), since 
it is easier to operate with independent random quantities. 
Therefore, we must ask ourselves: to what extent can we use, 
as an approximation, the fundamental law (64) instead of (63) 
for the calculation of the mean values of the phase functions 
in the microcanonical distributions? 

We have already mentioned that the majority of phase 
functions which one encounters in statistical mechanics are 
sum functions; i.e., they can be written in the form of a gum 
of terms each depending on the coordinates of one molecule 
only. Because of the above mentioned similarity in the distri- 
bution laws of small components, the mean value of each term 
ean be found approximately using the formulae belonging to 
the canonical distribution (this was a basic idea in our method 
of approximation). But the mean value of the sum is always 
equal to the sum of the mean values for dependent as well as 
for independent quantities; thus in calculation of the mean 
values of the sum functions we can, ss an approximation, use 
the canonical (64) instead of microcanonical distribution (63). 
As mentioned above this change constitutes cur method of 
approximation. 

However, if we are looking for the mean value of a function 
which is not a sum function (but, for example, the square of 
a sum function), the substitution of (63) by (64) would lead, 
generally speaking, to a completely erroneous result; the mean 
value of such a function in the microcanonical distribution ia 
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entirely different from its mean value in the canonical distri- 
bution. The trivial example of that kind is given by the dis- 
persion of the total energy of the given system: this quantity, 
which is apparently equal to zero in the microcanonical dis- 
tribution, has a definite positive value in the canonical dis- 
tribution. In the Chapter VIII we will see further examples 
of this kind. 

The relation between the distribution laws (63) and (64) is 
unfortunately not clarified sufficiently in the existing texts ou 
statistical mechanics. Thus, one often says that one can base 
the statistical mechanics on either the ergodic theory leading 
to the formula (63) or the “hypothesis of canonical distribu- 
tion” i.e. the distribution law (64) which is introduced ag a 
postulate, and that in the final count both points of view lead 
to the same results. 

We see, however, what is the actual situation. The basic 
laws (63) and (64) correspond to two entirely different idealized 
pictures. Looking for the rational foundation, instead of re- 
ferring only to the practical successes we will see that the 
ergodic theory, or some kind of its equivalent, is necessary in 
both cases, since the law (64) for a system in thermal equili- 
brium can be based only on the law (63) for the isolated system. 
Finally the statement that both theories lead to the seme 
result is correct only within certain limits (for mean values of 
the sum functions); beyond these limits the introduction of 
quantities taken from one of these idealized pictures into the 
other would lead to very serious mistakes. 


CHAPTER VI 


IDEAL MONATOMIC GAS 


26. Velocity distribution. Maxwell’s law. In the present 
chapter we will usa the fundamental formulae of the theory 
of the ideal monatomic gas os derived in sections 23, 24 of 
the previous chapter and will employ the same notation for the 
quantities involved. Let us choose one of the molecules of such 
a gas with mass m; , and consider ono of its velocity com- 
ponents, for example, z; = (1/m,)p-, where p,, is the component 
of the momentum of the molecule in the z-direction. Our 
problem is to find the distribution of the quantity z, . For 
this purpose we introduce the function: f(z; , Yi » Ze 5 Dee ; 
Py: » Ps.) Which is defined in six-dimentional space y; by the 
relations 


ai 1 
1; if z; = m, PF <z, 


0 otherwise 


where z is an arbitrary real number. It is clear that the mean 
value of the function f represents the probability of the in- 
equality z; < z, or, in other words, gives ua the distribution 
law of the quantity z, . 

But, according to the formula (48) section 21, chapter V: 


>. E an + o2) 
(66) F= f se Eg + oig). 
On the other hand, we have (section 23, chapter V): 


€; = T (ai + yi +) + U2; , Yi a 2) 


(67) ; 
p0) = Vam) 0. 
As we know, the presence of the term U; in the expression for 
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e; permits ws to calculate the integral on the right hand side 
of the formula (66) by integrating only within the limits of 
vessel containing the gas, however, within the vessel U; = 0. 
Since, finally 


dp., = mdz; ’ dp. = mdy; , dp,, = m,dz, , 
we have: 
ee Cee ee 
J= Vm ff emyr r He W das 


zi<s 


-GAS |- raja 





or, since each of the two complete integrals is equal to 
[(2x)/(8m,)]'”: 


j= (2)? f oo [-* eu 


Thus we see that the quantity z; is approximately dis- 
tributed according to the Gauss law with the center at sero 
and with the dispersion 2/(m,9). In particular, it meana that 
the distribution law of velocity components (known aa Max- 
well’s law) depends on the mass of the molecule. In the cage 
of the gas mixture the velocity distribution, in contrast to 
the energy distribution, is different for different molecules; the 
heavy molecules have smaller dispersion than the light oues 
which physically means that the former move slower than the 
latter (this is quite natural since their kinetic energies must 
be the same). 


27. The gas pressure. We know from physics the important 
role played by pressure in the theory of gases. It is impossible 
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to build the statistical theory of monatomic ideal gaa without 
first defining this physical notion in terms of our mechanical 
picture. 

Imagine within the vessel containing a given mase of gas a 
fixed thin plate with the area S. During the given time in- 
terval (t, £ + At) one side of this plate is subject, generally 
speaking, to a number of impacts from gas molecules. Each 
of these impacts communicates to the plate a certain impulse. 
The gum of the components of these impulses perpendicular to 
the plate determines the pressure of the gas on our plate, or 
to be more exact, on one of its sides. The mean value of this 
impulse referred to unit time and unit area is called the pres- 
sure (of a given point and in a given direction). 

To make this definition more exact let us choose on our 
plate an arbitrary point P and its arbitrary neighborhood As. 
The sum of impulse-components perpendicular to the plate 
which fall within the time interval (t, f + At) into the area 
As depends on the state of gas at the moment $ and represents 
consequently s phase function of our system. The mean value 
of this phase function is a quantity depending on Af and As 
only, so that dividing it by AtAs and letting the time interval 
At and the area As approach zero we obtain in the limit the 
quantity which we will call the gas pressure at the point P 
and in the given direction (as we shall see later, and as it is 
natural to expect on the basis of the symmetry considerations, 
the gas pressure of the point P does not depend on the direction 
chosen). 

Since we can choose the coordinate system arbitrarily, we 
can assume, without loss of generality, that the area As is 
perpendicular to the z-axis. Suppose a certain molecule has 
the velocity components z, y, z at the moment t. Where should 
this molecule be at this moment in order that it might strike 
the selected side of the plate As within the time interval 
(t, L + At)? It is apparent that it must be within a sloping 
cylinder with base As, height |z] At. Furthermore, the axis 
of the cylinder must be parallel to the vector (z, y, 2), and 
it must be constructed on that side of the plate As which is 
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subject to molecular impact. Let us assume, to be definite, 
that this side of the plate is facing the negative direction of 
z-axis; in this case apparently we must have z > 0 since 
otherwise impact is impossible. (We are neglecting here the 
possibility of collisions between the molecules under considera- 
tion during the time interval At.) 

Let Q(x) be the structure function of the entire mass of gas, 
and Qt, (x) be the structure function of the system which is 
obtained by removing the above selected molecules from the 
gas. We know (comp. 25) that for a total gas energy E the 
probability density for the selected molecule in its phase space 
is given by 


_ QE — e) 
~ ~~ 9) 


where the energy of the selected molecule e; is a function of 
its six dynamic coordinates as determined by the formula (67). 
This density g is also a function of the same kind. If the point 
(z: , Yi , %) is outside of the vessel containing the gas, 
U (2; , Y: , 2) becomes infinite and g = 0 since 2 (E — e;) = 0. 
Inside of the vessel U;(z; , Y: , 2;) = 0, and e; together with 
a (E — e,) are functions of the velocity coordinates z, , Y: , 
z; only. The same is true for the density g. This function is 
determined more exactly by the formula (53) of chapter V, 
according to which the quantity Q(E) is given by the product 
of V” and a constant (here x = E) whereas the quantity 
a (E — e,) (for the points within the vessel) is equal to the 
product of V* and a certain quantity depending on z; , Y: , 
z; only. Therefore: 


J— 
(68) q = y xe Us,» Z;) 


where the form of the function x can be determined (this, 
however, is of no importance for us at the moment). 

The probability that the selected molecule will strike the 
area As within the given time interval At and with the velocity 
components in the interval z to z + dz, y to y + dy, and 
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z toz + dz, can be obtained by integrating the quantity 
+ lz, ¥, 2) dz dy d dz dy de 


over the entire volume of the above described cylinder, i.e. 
will be given by 


ola | AtAsx(z, y, 2) dz dy dz. 


In the process of the impact our molecule communicates to 
the area As the impulse m; |z | in the direction of the z axis 
(m, being the mass of the molecule). Remembering that this 
impulse is sero if at the moment £ our molecule is located 
outside of the above described cylinder, we obtain for the 
mathematical expectation of the mean impulse communicated 
by the selected molecule to the area As in the direction Oz 
during the time interval At the expression 


(69) m lf xa i, 8 di dy è 


where the integration is performed only for the positive values 
of z (and all values of y and z). 

We have been speaking so far about the first phase of impact 
during which the velocity of the incident molecule drops from 
x to sero and it communicetes its original impulse in the 
direction Oz to the plate. This will be followed by second 
phase during which the plate communicates to the molecule 
the impulse in the opposite direction. We can calculate the 
mathematical expectation of this mew impulse in the same way, 
with the only difference that the integral in the expression (69) 
must be taken along the negative axis Oz. 

In order to obtain the mathematical expectation of the total 
impulse communicated to the aren As during the time Af in 
the direction 0z we must add both results, thus obtaining sn 
expression of the type (69) in which the integration is extended 
for all coordinates within the limits (— œ», +œ). 
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The integral (69) can be evaluated approximately without 
detailed calculation, since the integral 


(ES E +i + Axi, id di dy de 


represents the mean value of kinetic energy of the selected 
molecule,’ which, according to formula (57) chapter V, is equal 
to 3/23. Because of the symmetry in the three velocity nam- 
ponents, the integral in (69) is equal to ono third of the integral 
in (70), thus being equal to 1/ (m8). 

Consequently the mathematical expectation of the impulse 
becomes approximately 


AsAt 
Vo 
or 
AL 
Vo 
per unit time per unit area. (In particular we notice that it 
does not depend on the mas of the selected molecule.) 
If the number of molecules is n, the mesn value of the total 


impulse communicated to the plate (in the normal direction) 
per unit area per unit time is: 


n 
(71) P = 3y 
which is usually known as the pressure of the gas. We see that 
the gas pressure does not depend an the location or the orienta- 
tion of the plate being only a function of the volume, the 


iIn fact, according to (68), this mean value is represented by 


Ti f @ + i? + Pq de dy de di dy de 


z: fff (a + y? + 2x di dy d. 
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number of molecules and the total energy of the gas (since @ is 
function of the total energy determined by the formula (55) 
chapter V). 
From (71) we obtain: 
pV = n/3 


which coincides with the well known formula of Clopeyron if 
we put ð = 1/kT where k is the Boltzmann’s constant and T 
the absolute temperature. In the next section we will discuss 
the validity of such a physical interpretation, and will now 
remark only that the relation 


n/V = pd 


(obtainable from (71)), in which ð is considered as some uni- 
versal function of temperature, leads to the well known 
Avogadro law: for equal temperature and pressure all gases 
contain am equal number of molecules per unit volume. 


28. Physical interpretation of the parameter o We have 
sean that the parameter ð plays a very important role in 
statistical systems; it enters in an essential manner into the 
expressions for any physical characteristics of the system. 

On the other hand the notion of the temperature which 
plays, as is well known, the fundamental role in thermo- 
dynamics has not found aa yet any interpretation in our 
mechanical theory; in our attempts to build a purely mechanical 
theory of the heat processes we must try to find an interpreta- 
tion of this physical notion in terms of the notions used above. 

This comparison justifies an attempt to connect the physical 
meaning of the parameter ð with the temperature T of the 
system, although, of course, it does not justify postulating a 
universal functional dependence between 3 and T. We have 
seen in the previous section that in the osse of a monatomic 
ideal gas one can give definite arguments in favor of such a 
postulate, and that in this case one can arrive at the relation 


1 
(72) v= IT 


122 


where k is Boltzmann’s constant. However, even within the 
theory of the ideal monatomic gas it would be reasonable to 
explore the immediate consequences of the above postulate 
before accepting it. Thus we could substitute the expression 
(72) for ð in the distributions so far obtained (for example, 
in the section 26); we would see in this case that these dis- 
tributions coincide exactly with those derived in physics oa 
the basis of entirely different considerations. 

Since the postulate (72) demands æ careful check even in 
the simple theory of the ideal monatomic gas, it is clear that 
at the present moment we can tell hardly anything more 
definite about general dependence between # and T in the 
casa of more complicated physical systems. We will see, how- 
ever, in the next chapter that the interpretation of the funda- 
mental laws of thermodynamics on the basis of our general 
mechanical theory will lead quite naturally to a generalization 
of the postulate (72) for a very broad class of physical systems. 

In the present chapter we will give only une additional argu- 
ment in favor of a universal dependence between 3 and the 
temperature of the system. 

Suppose we have two physical systems with temperatures 
T, and T, which are characterized by the values J, and ð, of 
our statistical parameter. 

If we unite the two systems (T, , #,) and (T; , Js) into one 
system (T, ð) (i.e. if we assume that these two systems aro 
interacting with one another), the finite temperature T of the 
composite system will lie between T, and T, ; in particular in 
the case T, is equal to T, we will have T = T, = T, . If we 
assume that ð is a monotonic function of temperature its value 
must be intermediate between 3, and 0, for the composite 
system. It is easy to see that 3 actually possesses this property. 

In fact let $,(a), &.(a) and (a) be the generating func- 
tions of the two components and the composite system, and 
E, , E, and E the corresponding total energies (so that E = 
E, + E, and ®(a) = %,(a)®,(a)). 

We know that: 


Bo that 


qs) (228) = (2%) + CR) 


da 
On the other hand, for arbitrary a, and, in particular, for 


a= 


(74) dlog % _ dlog è, | dlog ® 
da da da 


a=9 


Let us assume for the moment that # lies outside of the 
interval (ð, , 3.), being, for example, larger than ð, and ð, . 
Since according to the property 4 (chapter IV, section 16) the 
logarithmic derivatives of the leading functions increase with 
increasing value of the argument, we would get 


(d log 2) a (4 log 2) 
da Ja- da 


3 
a=ĝ, 


(2 log 2a) 3 (2 log 2a) l 

da ang da a -ð : 
thus for a = @ the right hand side of the equation (74) is 
larger than the right hand side of the equation (73) whereas 
their left hand sides ara equal to ane another. This contra- 
diction proves the incorrectness of the above assumption. This 
argument holds apparently for any physical system of the 
general type considered above. 


29. Gas pressure in an arbitrary field of force. We will 
now return to the casa of the ideal monatomic gas, but will 
assume that its molecules mre subject to the action of an 
external field of force, and that the potential energy of each 
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molecule is s function only of its position (i.e. independent of 
its velocity and of the position and velocities of other mole- 
cules). 

In other words we will assume that the potential energy of 
each individual molecule can be written in the form 


(75) e; = * (2? + y: + 23) + e(z; Yu» 2), 


where e;(£; , Y: , 2:) is the potential energy in the given field. 

We will define the gas pressure p at a given point and in a 
given direction in the same way as was done in section 27. 
The distribution law for a selected molecule of our gas can be 
written (in the notation of the section 27) in the form: 


(76) 


where e; (the energy of the selected molecule) is a function 
of six dynamic coordinates of that molecule aa determined by 
the formula (75), ꝑ (z) is the structure function of the system 
consisting of the entire masa of the gas minus the selected 
molecule. At a certain moment $ the selected molecule is 
located within a certain cell of the phase space; in order that 
this molecule would hit the area As within the time interval) ¢, 
t + At) it is necessary and sufficient that this cell is a cylinder 
with the base As and the height zAt(z > 0). The impulse 
communicated by this molecule to the area As is m,z. We are 
neglecting here the effect of the collisions between the mole- 
cules 3 well aa the possible deviations from rectlinear motion 
caused by the external field; in fact, these effects vanish when 
As and At approach to zero. 

Thus, in order to calculate the value of the mean impulse 
communicated by the selected molecule to the area As in the 
direction Oz during the time interval At, we must integrate 
the product of m,z and (76) over the six dimensional phase 
space of the selected molecule; the integration must be carried 
over the region in which z > 0 and the limits of integration 
are determined by the condition that the selected molecule lies 
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within the above described cylindrical volume. Since all dimen- 
sions of this cylinder are small quantities of the order of As 
and At, we can consider the values of the space coordinates 
entering into the integral as certain fixed quantities (for ex- 
ample, making them equal to the coordinates of a certain point 
P within the area As and towards which this area contracts 
in the limiting case). We can repeat here all the arguments of 
section 27 pertaining to the second phase of the impact, and 
come to the conclusion that in order to calculate the value of 
the total impulse we must extend the integration along the 
entire real axis Oz. Thus we can write for the mean value of 
the impulse 


— ewe (4) 
XE f f m;x-râtAsQ "(E — e,) dp, dp, dp, 
miAtAs 


= OE) fff T'E — dz; dy; dz; 


or 





a [ff 29E — e) di: ay, dz: 


per unit area. As already mentioned, the integration must be 
extended through the entire three dimensional space (z, , Y: , 2:), 
whereas the space coordinates x; , y; , z: may be set equal to 
the coordinates z, y, z at the point P at which the gas pressure 
is being calculated. In order to get the pressure of the gas we 
have to take the sum of the above expressions for all n mole- 
cules of the gas. This gives: 


= p= > — E Sff: POOE — e) di; dy, dz 


It is clear that in this more general case the pressure of the 
gas is different at different points since the above expression 
for p depends ou the coordinates zx, y, 2 of the point P. 

In considering an approximate expression we will consider 
only the leading terms, leaving the reader the estimate of the 
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corresponding errors (which can be done easily on the basis of the 
formula in chapter V). 

To approximate the expression (76) we will use the formula (45) of 
chapter V. (Although this formula was established for values of e 
deviating from the mean value by an amount of the order of 0 (n H, 
it is easy to verify that the part of the integral in the expression for p 
which corresponds to the large deviations is negligibly small as com- 
pared with the main part. Thus in the approximate evaluation of this 
integral we can either neglect this part or, what is more convenient, 
use for the integrated function the same expression as in the main 
part. That gives us: 


= Èm fff z a di, dije day 


where 
ð) = -9u dy 
el ) J é ‘ 


is a generating function of the selected molecule. 

We now introduce for the energy e; its expression given. by 
(75) remembering that in this expression for the coordinates 
Zi ı Yi , Z; Of the selected molecule must be substituted: the 
coordinates z, y, s of the point P. Since 


¢i(9) = J e°" dv; = fff exp [—ve,(z,; , ys , M da; dy, dz 


(where p. = M;i , Dy, = MY, De, = M2z;) we obtain, after 
obvious simplifications: 


2* exp [—d«(s, y, 2) 
° — fff exp [—de,(z, , Yi , 2:)] dz, dy, dz, 
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or since 


[Leola = eon 

we get finally 
— [~de(z, 2) 
— SK [ff exp [—de,(z; , Yi , 2:)] dx, dy, dz, 


In the particular case considered in section 27 «; is equal to 
zero inside of the vessel and to + © outside of it, which gives 
us: 


(77) 


— 
P = 3y 

where V is the volume of the vessel. This is exactly the ex- 
pression which we obtained before. 

We will consider as an example the behavior of a gas en- 
closed in a vessel and subjected to the action of gravitational 
force along the negative z direction. Ia this case the potential 
energy is 

e,(z, Y, z) = mgz + U(z, Y, 2) 


where U (z, y, 2) = 0 inside the vessel and = + œ outside of it. 

According to the formula (45) chapter V the distribution law 
of a single molecule is given by the approximate expression of 
the density: 


1 Ome cap oa gy 
(78) 90) exp | — 9 (zi + yi + Z) Omage | 


inside the vessel (outside the vessel the density is, of course, 
equal to zero). 

Assume for simplicity that all molecules have the same magn 
m; = m. Let us select an arbitrary point Po(z» , yo , Zo) and 
imagine it to be surrounded by an elementary volume dvo . 
The probability that same molecule will fall within this volume 
can be obtained by multiplying dvo by the expression (78) in- 
tegrated over all three velocity coordinates from — © to + œ 
(here, of course, 2. = Zo). This probability will, obviously, have 
the form 

Ae~***** dug 


where A is a function of m and 3. The mean number of mole- 
cules which fall within the volume dv, is therefore: 


nAe~™***? du, i 


At another point P(z, y, 2) within the volume dv this number 
is given by: 
nAe~™**? dy. 


If dv = dv, the ratio of these numbers (i.e., the density relative 
to P.) is equal to 


exp [—mgd@ — 2)]. 


If we put, as above, 9 = 1/kT, we obtain for the relative 
density of the gas the well known “barometric” formula 


mf gea] 


which is‘ usually derived in physics on the basis of entirely 
different considerations. It is clear that formula (77) (for 
m, = m) leads us to an identical expression for the relative 
gas pressure. 


CHAPTER VII 


THE FOUNDATION OF THERMODYNAMICS 


30. External parameters and the mean values of external 
forces. In the previous chapters we have often considered the 
case where the energy of the molecules forming the given 
physical system depends not only on the dynamic coordinates 
of these molecules, but also on the number of parameters 
characterizing the position or the state of external bodies acting 
on the system in question. Thus, for example, in the previous 
section the quantity g, characterizing the gravitational field, 
entered in a natural way into all the pertinent preceding 
formulae. In other casea such parameters can be represented, 
for example, by the coordinates of some attraction or repulsion- 
centers. In the future such parameters will be referred to ss 
external parameters. Mathematically such an external param- 
eter is characterized by the fact that it has the same form in 
the energy expressions for all molecules. 

However, in all the above considerations, we have assumed 
that the values of the external parameters always remain 
constant; we will now concentrate on the cases when the 
external parameters change with time. We remark that the 
energies e; of the individual molecules as well as the total 
energy E = Ze; of the system are functions of the external 
parameters which we will in general denote by ^ , -+> ,A,.A 
change of the external parameters (such as the change of the 
field of forces, the change of the position of the attraction— 
or repulsion—centers etc.) will result, generally speaking, in 
a change in the energy of the system; the pojnt representing 
the system in its phase space will in this caze execute a transi- 
tion from one surface of constant energy to another. This 
change of energy ia due to the work done by such external 
forces ag change these parameters. 

The quantity 9e,/8A, will be called the generalized external 
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force acting the ż¿-th molecule “in the direction” of the 
parameter A, . Similarly the quantity 


AB _ yy a 

a, “ar, 

will be called the generalized external force in the direction of 
the parameter A, acting on the entire system. Apart from the 
parameters \, , +-+- , A, , the quantity X, depends on all the 
dynamical coordinates of the given system, i.e., on the exact 
position of the representative point of the phase space. In 
particular, the quantity X, may have entirely different values 
for two different points on the sama surface of the constant 
energy. Thus from the point of view of our theory the quantity 
X, is a phase function, or, in the terminology of the theory of 
probabilities, a random quantity. Thus it is natural to ask our- 
selves about the mean value X, of this quantity on a given 
energy surface. According to the formula (48) chapter V we 


have: 

de, _ f des t o(2) 

an Jy, Od, oly MT 
where y: , ¢; , and dv, stand for the phase space, the generating 
function, and the element of volume in the phase space of the 
selected molecule, whereas the quantity A, is related to the 
total energy of the system Æ in the same way as in the previous 
Cases. 

But since 


X, = 


¢(8) = f e7’ dv; 


Ys 


we have 
Oat 


f See — _ 1d loge, 
qe OA, G9) Ò ad, 


and consequently 


dee _ _ 19 log & 
oA, Ò OA, 
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from which follows, approximately, 
MD: Xe On, » a 8 2 a DA 


where @ = (9) is the leading function of the system, de- 
pending, apart from 3, on all external parameters. The above 
formula represents an extremely simple expression for the de- 
sired mean value. 

The element of work done by the external forces when the 
external parameters change by dA, , --: , dà, , will be defined 
as usual, by the expression: 


bA= > X,aM,. 
al 
Similar to the quantities X, , the quantity 6A is a certain phase 
function (a random quantity). Calculating the mean value of 
5A for a given value Æ of the total energy of the system, we 
obtain: 


(0) =E a, -= -1 EHE? n,. 
aml Qa aal dA, 

It is important to remember that the sum in the right hand 
side of the above expression does not represent the total 
differential of the function ©, since, apart of the external 
parameters, this function also depends on the parameter #. 


31. The volume of the gas as an external parameter. One 
of the most important parameters encountered in the study of 
gases is the volume of the vessel containing the gas. This 
volume can usually be considered as a function of only ane or 
a few external parameters. Let us consider the simple case of a 
gas enclosed in a cylindrical vessel with a movable piston; here 
the shape of the vessel is completely determined by the volume 
V. Thus, if the only forces acting on the gas are due to the 
reaction of the walls, the function U(z; , Y: , 2), representing 
the potential energy of the molecule at the point (z, , ys , 2:); 
is completely determined by the quantity V; this justifies oon- 
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sideration of the quantity V as the external parameter of our 
system. 

Let us find the expression for the mean value of the general- 
ized force acting along this parameter. If we consider the case 
of an ideal gas, and use the formula (54), we obtain 


&(9) = (2r maf Il miho y’. 
iat 


According to the formula (79) of the previous section, the mean 
value of the force becomes 


5 ee a E E 
ð əV ov 


The above expression is equal in magnitude and opposite in 
sign to the expression for the pressure of the gas derived in 
the previous chapter. Thus we can consider the gas pressure 
az the mean value of the force with which this gas acts on the 
external bodies ‘‘in the direction” of the parameter V. In 
particular, the mean value of the elementary work done by 
the gas when its volume changes by dV, can be written as: 


—6A = pdV. 


32. The second law of thermodynamics. The science of 
thermodynamics is based essentially am its two fundamental 
laws; thus, every theory pretending to represent the foundation 
of thermodynamics must prove that these two fundamental 
laws can be derived from its basic principles. Once this is done, 
the entire system of thermodynamic theory can be developed 
logically as a consequence of the two laws. 

The first fundamental law of thermodynamics is the law of 
conservation of energy; it is clear that any mechanical founda- 
tion of the theory of heat includes this law quite automatically, 
since the law of conservation of energy represents the first 
integral of the equations of motion. 

We encounter an entirely different situation in the case of 
the second fundamental law which, in the frame of the me- 
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chanical theory, presents a theorem subject to mathematical 
proof. 

In the customary (non statistical) treatment of thermody- 
namics the state of a physical system is usually characterized 
by a set of external parameters and by the temperature of 
the system. We have seen before that there are many ressons 
to assume that the parameter # is directly connected with the 
notion of the temperature of the system, and that, on the other 
hand a knowledge of ð is equivalent to a knowledge of the 
total energy E of the system. Thus, in the frame of our theory, 
the state of the system is completely determined if, in addition 
to the value of the external parameters, we also know the 
surface of constant energy which contains the representative 
point of our system. Thus, in the classical treatment, we do 
not distinguish between the states of the system represented 
by different points on the same energy surface. Because of this 
fact we will agree to use the term ‘‘a thermodynamic function” 
for any quantity which is completely determined by parameters 
0, 1, °°: , A, . We have already encountered such ‘‘thermo- 
dynamic functions” in many places in the previous discussion. 
As an example, we can mention the generating function #(#), 
the total energy of the system E = — (ð log ©)/d0 (we use 
here the partial derivative to underline the fact that the values 
of parameters \, . +- , A, must be kept constant) and finally 
the mean values of the acting forces 


= 1 8 log (3) 
X= a, —* 1 <s<n). 
It is clear that any thermodynamic function is at the same 
time a phase function subject to the condition of having a 
constant value on the surface of constant energy. Conversely, 
any phase function which is constant on a surface of given 
energy can be considered as a thermodynamic function. 
Consider the transition of a system from one of its states 
Z,(3, M, °°: » A), (determined in the above described classical 


thermodynamical sense) into another “immediately adjoining” 
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state Z,(8 + dd, M` + dh, --:,A, + dd,). The work done 
by the system in this transition is given by: 
-54 = 5 -Xa =- Daw. 
221 a=] s 


It must be remembered here that the generalized forces X- 
are not phase, but rather thermodynamic, functions depending 
on the entire set of the phase coordinates (including, of course, 
the external parameters). These forces (and consequently also 
the work — ôA) are not determined by the knowledge of the 
states Z, and Z, , since s given thermodynamical state of a 
system corresponds to ah entire continuum of individual states 
in the sense of statistical mechanics (the entire energy surface 
in the phase space of the system). From the point of view of 
statistical mechanics the work — ôA is not determined uniquely 
by the original and final thermodynamic characteristics of the 
system; in fact, it can have entirely different values depending 
on the exact positions of the representative point of the system 
in its phase space. This indicates that the quantity — ôA can- 
not be identified with the elementary work in the sense of 
physical thermodynamics. 

As we have often seen above, such a situation is typical in 
our theory; in any attempt to build a bridge between statistical 
mechanics and any physical theory the role of physical quanti- 
ties is played by not phase functions themselves but rather 
by their mean values taken over a given thermodynamical state 
of a system. 

In our case the equivalent of the elementary work as con- 
sidered in classical thermodynamics, is not the phase function 
— 5A but its mean value on a given surface of constant energy: 


-7A = — EF d = 5 ye a. 
o= 221 


Writing dB for the total energy change of the system in its 
transition from the state into the state Z, we have 


dE = E e+ Dea. 
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The above expression is determined uniquely by the states Z, 
and Z, since Æ is a thermodynamical function with a well 
defined value for each thermodynamically described state. But 


and consequently 


[d log $ + Edy}. 


1 
0 
Therefore, 


d(dE — 3A) =o +o D E an, + d log è + Bas 
aml 8 


= B gy 4 D 22B an, + d log è = dE + log 4). 


e=1 
Thus we see that the quantity 
o(dE — 8A) 


is the total differential of a certain thermodynamic function. 
The above result really contains the second law of thermo- 
dynamics. In fact, in the classical presentation of thermo- 
dynamics the quantity dE is the sum of the work5A done by 
the external forces, and the “amount of heat” 6Q received by 
the system during the elementary transition. Since the quan- 
tity ôQ ia formally defined aa the difference dE — ôA, it is clear 
that it need not necessarily be the total differential of some 
thermodynamic function. However, the second law of thermo- 
dynamics tells us that the quantity 5Q/7, where T is the 
absolute temperature of the system, is always a total differential. 

The most satisfactory statement of this law is as follows: 
there exists such a function #8 of the temperature, and such a 
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function W of the temperature and of the external parameters 
that for any elementary change of the thermodynamic state of 
a system we have 

vô = dW. 


In other words the function 8 represents the integrating factor 
of the quantity 5Q. 

The existence of such an integrating factor, depending only 
on temperature, represents one of the formulations of the 
second law of the thermodynamics. This is, however, exactly 
what we have proved if one considers the parameter 3 to be 
directly connected with the temperature of the system. 

In classical thermodynamics the absolute temperature T is 
defined by the relation 


(81) j= * (= Bokma constant) 


in terms of this integrating factor J, the existence of which is 
postulated in the second law. Introducing 


kW = k[Eð + log (9)] = X( log #9) — 9 2 log 2) = § 
we can write the second law of thermodynamics in the form 
ô _ 
T7” dS. 


The thermodynamic function S is known aa the entropy of the 
system. The above given argument is the complete foundation 
of the second law of thermodynamics in the frame of our theory, 
and indicates the reasonableness of the relation (81) H& a 
universal postulate pertaining to any system of the assumed 
type. 

Example: for an ideal monotomic gas we have, according to 
chapter V, 


— _ 9 loge _ 3n _ — 3n 
E æ oy E=- gp% 


wy nr 
— ôA =pdV = q dV 
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“de = 34 > 2 am 
ô = dE — 8A = — zp W + Sav = ar + av, 


ô 
Ta rt”y = a(3 kn log T + kn log V) = dS, 


from which follows 


(82) S = 3 kn log T + kn log V + C, 


where C is a constant. 

As wa have already mentioned, we will not be concerned in 
this book with the derivation of the entire system of thermo- 
dynamics; this can be done on the basis of its two fundamental 
laws without relation to the statistical point of view. Our task 
was only to show that these two fundamental laws represent 
the necessary consequences of our point of view. 

However, in concluding this chapter, we must discuss in some 
detail a number of fundamental questions connected with the 
notion of entropy. 


33. The properties of entropy. The notion of entropy is 
one of the most important physical notions from a theoretical 
as well as from a practical point of view. Very few other 
notions can compete with it in respect to the abundance of 
attempts to clarify its theoretical and philosophical meaning. 

Many of these attempts are closely connected with the 
statistical interpretation of the phenomenon of heat, and are 
sometimes directly based on such an interpretation. Our prob- 
lem is to sev to what extent such probabilistic foundations of 
thermodynamics give a basis for certain far reaching state- 
ments concerning the nature of entropy. 

In the above discussions we have always defined the quantity 
V ap a single root of the equation 


d 1g b(a) +E=0 
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Hence, we have seen in chapter IV, ð coincides with the value 
of the variable a for which the function 


e*" (a) 
and, consequently, also its logarithm 
(83) Ea + log ®(a) 


possess a single minimum. But, by the definition in the previous 
section the quantity 


Ed + log (8) 


is equal to the entropy of the system except for a constant 
factor. Thus we see that the entropy can be defined (up to a 
constant factor) as the minimum value of the function (83) 
with the argument 3. This permits ua to establish one im- 
portant property of the entropy. 

Suppose we have two systems which together form a com- 
posite system. We will use the indices 1 and 2 for quantities 
pertaining to these systems, and no index for the composite 
system. Since E = E, + E, and 


Pa) = %,(a)%,(a) 
we have 
Ea + log ®(a) = Eia + log (a) + Ea + log (a), 
so that 
S = k[Ed + log 2(3)] 
= k[E,9 + log ,(9)] + k[E: + log #,(8)]. 


Since the functions Ea + log %,(a) and Ea + log ®,(a) 
reach a minimum for a = #, and a = #, respectively, we obtain 
S > k{E,0, + log £,(0,)] + k[£.9. + log $,(8.)] = Sı + S3. 
This means that the entropy of the system obtained by bring- 
ing into thermal interaction two previously isolated systems is 
never smaller than the sum of the entropies of the two cam- 
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ponents; the two quantities become equal if the two com- 
ponents have originally the same temperature. 

It must be noticed here that this theorem is often used, 
without sufficient foundation, in reaching rather broad con- 
clusions, and the theorem itself is often expressed in rather 
indefinite and exaggerated terms. For instance, one states 
that. because of thermal interaction of material bodies the 
entropy of the universe is constantly increasing. It is also 
stated that the entropy of a system “which is left to itself” 
must always increase; taking into account the probabilistic 
foundation of thermodynamics, one often ascribes to this 
statement a statistical rather than on absolute character. This 
formulation is wrong if only because the entropy of an isolated 
system is a thermodynamic function—not a phase-function— 
which means that it cannot be considered as a random quantity; 
if E and all A, remain constant the entropy cannot change its 
value whereas by changing these parameters in an appropriate 
way we can make the entropy increase or decrease at will. 
Some authors’ try to generalize the notion of entropy by con- 
sidering it as being a phase function which, depending on the 
phase, can assume different values for the same set of thermo- 
dynamical parameters, and try to prove that entropy so de- 
fined must increase, with overwhelming probability. However, 
such a proof has not yet been given, and it is not at all clear 
how such an artificial generalization of the notion of entropy 
could be useful to the science of thermodynamics. 

We will arrive at a much more rational formulation of the 
problem if we will consider the given system as a part of an- 
other more extensive system. Let us assume that this more 
extensive system (which we will characterize by the asterisk) 
is in thermal equilibrium (compare chapter V, section 25). In 
other words, our system represents only an infinitesimally small 
part of this large system. In this case the energy EF of the given 
system is no longer determined by the values of the parameters 
v, \,(8 = ð* since, being in thermal equilibrium with the 


Comp. Borel, Mécanique statistique classique, Paris 1925. 
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larger system, our system has the same temperature) but is a 
random quantity whose distribution law is given approxi- 
mately (compare chapter V, section 20) by the density: 


O(E)e~°* 
(ð) ` 


It is clear that the relation E = — (ô log ¢)/ðð cannot hold in 
this case since the left hand side of this expression is a random 
quantity, whereas the right hand side is determined exactly 
by the temperature ð* of the thermostat. Thus the quantity 
E oan no longer play the came role ns before with respect to 
the second law of the thermodynamics. 

However, the mean value of E, given by 


mw. f EME eE n _aloge® 
E= | (ð) — ao’ 


is related to ð in the sume way as in the case of an isolated 
system. Thus, defining the entropy of the system by the 
expression 


(84) S = k[Eð + log o(9)] 


we deal with a thermodynamic function which is subject to all 
the arguments set forth in the previous section; in particular 
the proof of the second law of thermodynamics remains com- 
pletely unchanged. 

Using the formula (84) one can give a simple derivation of 
one of the most fundamental inequalities of thermodynamics. 
Let us assume that the system described by the numbers £, , 
3, , S, , interacts thermally with another system described by 
the numbers E, , 3, , Sz (this second system is necessarily 
considered to be a thermostat). Let the numbers E = E, + Ez, 
ð, S characterize this composite system. Since the function 
E,a + log (a) possesses a minimum for a = ð, we have 


Sı = k[E 0 + log %,(8,)] < k[E,0 + log %,(8)]. 
On the other hand formula (84) gives us 
S, = KE. + log 4,(9)], 
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where E, is the mean energy and S, the entropy of the given 
system considered us a part of the composite system. It follows: 


S, — S, > ko(E, Fa E,) 


where, because of the approximate expression (51), 


E, = -H8S 7 _ Uga 

dô, 
Ifd, <8 <9, , we have E, > E, since, as we know, the func- 
tion (d log )/da never decreases. Therefore: 


o(E, — E,) > (E, — E). 


The same relation takes place for 8; > ð > ð, since in this 
case E, < E. 

Thus, in all cases we have 
E, TE E, 

T: 


which can be described by the following statement. The entropy 
increase of a system resulting from its thermal interaction with 
any other system cannot be smaller than the energy increase of the 
first system divided by the absolute temperature of the second. 

One can, of course, generalize the notion of the entropy by 
writing for any state of the system 


S = k[Ed + log #(8)] 


in which case the entropy itself becomes # random quantity. 

For such a definition of entropy the second law of thermo- 
dynamics loses meaning, remaining applicable, however, to the 
mean value S which apparently is identical with the expression 
(84). In this cage the distribution law of the given system in 
its phase space is given by the probability density 


Sı — S, > kô (E, — E) = 





from which follows 


t42 


This expression is often used to justify the statement that 
“the entropy of a system is proportional to the logarithm of 
the probability of the corresponding state” (Boltzmann’s 
postulate). This statement, which is absolutely meaningless in 
the aase of an isolated system, obtains, as we see, somo meaning 
for a system in the larger system. This nan be accomplished 
however, only by using the above described generalization of 
the notion of entropy which is introduced “ad hoc’’. In fact, 
ous must not forget that this notion is used in connection with 
the second law of thermodynamics which loses meaning when 
the generalized definition of entropy is used. All existing 
attempts to give a general proof of this postulate must be con- 
sidered as an aggregate of logical and mathematical errore 
superimposed on a general confusion in the definition of the 
basic quantities.” In the most serious treatises on that subject 
(for example: R. H. Fowler “Statistical Mechanics” Cambridge 
1936) the authors refuse to accept this postulate, indicating 
that it cannot be proved, and cannot be given a sensible 
formulation even on the basis of the exact notions of thermo- 
dynamics. 

However, proceeding in this direction wa can obtain some 
reasonable and rather interesting results. 

Consider two isolated systems, characterized by the indices 
1 and 2, which form a composite system (characterized by no 
indices). 

The total energy E of the composite system can be distributed 
in many different ways between the two thermally interacting 
components (the probabilities of various distributions have 
been considered in detail in section 22, chapter V). Let us 
write p(£,) dE, for the probability that the energy of the first 
system lies in the interval [E, , E, + dH]. 

We know that the sum of the entropies S, + S, of the 
isolated systems can never exceed the entropy S of the com- 
posite system so that 


*Comp. the “proof” in “Thermodynamik” by M. Planck, and the 
corresponding critique in “Statistical Mechanica” by R. H. Fowler. 
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S — (Sı + Sa) > 0. 


It can be shown that the excess is proportional, except for 
a constant additive term, to the logarithm of the quantity 
p(E,). This statement has the following meaning. Consider two 
systems which interact thermally forming a single system with 
the energy E and the entropy S. At some moment we isolate 
the two systems from each other thus obtaining two systems 
with energies Æ, and E.(Z, + E, = E), and the entropies S, 
and S,(S, + Sa < S). The quantities Æ, and E, must be 
considered a3 random quantities with the probability densities 
P(E) and p(E:). Our statement tells us that the quantity log 
p(E,) (as well as log p(E;)) is connected linearly with the 
quantity S — (Sı + Sə). 

To prove this we must remember that, according to formula 
(27) chapter IV: 

pE) = 2, =e Bs : 
On the other hand, because of (42) chapter V, we have, approxi- 
mately, 
en” &( 3 es” 
aD = GB" = eB 
In similar fashion 
8./h 


0,(E,) = 0,(E.) = np)? 

where 
d log (a) d log %,(a) 
B = ( * = B, = ( “da 2) 


B, = (F46) 


aað, 


This gives us the approximate relation: 


(85) log p(B) = — È [S — (S: + Sd] + 3 oe BE 
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The second term on the right hand side cannot be considered 
&s a constant (independent of chance) quantity since B, and 
B, depend nn ð, and ð, which are in turn connected with the 
random quantity E, . It can be easily shown, however, that in 
the case when Æ, does not deviate appreciably from its mean 
value E, (for which 3, = ð), the values of B, and B, are very 
close to 
(2 log Pie) (Z log P) 
da’ a= da? a=ġ ; 

These quantities, which do not contain any random element 
will be denoted by Bi and B; . We will indicate here only the 
main steps of this proof leaving the more detailed calculations 
to the reader. 

Because of the fundamental law of composition of the gene- 
rating functions, the logarithms of these functions, as well as 
all derivatives of these logarithms, are infinitely large quantities 
of the order n where n is the number of molecules forming 
the system. In particular the quantities £, , E, , B, , Ba, Bi, 
B; are infinitely large quantities in the above described eanse. 
For the same reason the quantities B, — Bj and B, — Bj 
have orders of magnitude n(d, — 8), and n(O, — 2) for small 
values of (8, — &) and (ð, — #8). Thus the ratios B{/B, and 
B3/B, differ from unity by (8, — #) and (8, — ð). On the other 


hand: 
E, = — (H 2) 


anað, 


and, approximately, (according to (51) chapter V): 
a=Jd 


da 
so that 
E, — E, = — (Hise Ba) i — 9) + Ofn@, — 9) 


Bi, — 8) + On, — 9)’. 
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Thus we conclude that for 
E, = E, = O(n) 


the difference 9, — 9 is an infinitesimally small quantity of 
the order: 





Consequently: 
Bi (2 — Z) B3 a=; ( | eae Ea) 
B, = = 1 + 0|-———], B tN : 
and 


log B} — log B, = of = 1 = =F), 


log BS — log B, = —E — =H) 


If, as we have assumed, the deviation of Æ, from its mean value 
E, is negligibly small in comparison with n (which is extremely 
probable because, as we shall see later, the mean square devia- 
tion of the quantity Æ, is of the order of magnitude of n'’”), the 
estimates given above permit us to substitute the quantities 

{ and B; for the quantities B, and B, in the formula (85), 
and obtain the approximate relation 


1 1, 2B 
log pH) = ¢ (Sı + S: — S) + 9 log pipe 


which proves our statement. 


34. Other thermodynamical functions. Before finishing 
this chapter we want to consider the expression of some other 
important thermodynamic functions in terms of our general 
theory. 

1. The thermodynamical potential or characteristic function of 
Planck 

Yhd, vA.) = log (8) 


146 


is a convenient function because of the fact that almost all 
most important thermodynamical functions of a system can be 
expressed in terms of it. Thus, 





ov ov 
B= — 39° s = Hv - 0%), 
= 1 ov 
X.= -5a (s = 1, 2, - 7); 


etc. 

2. Heat capacity. Let us exsume that the temperature and the 
external parameters of a given system are subject simultane- 
ously to some infinitesimal change. The heat capacity of the 
system is defined as the quantity 


where 8A is the mean value of the element of work discussed 
in section 30. It is clear that this quantity assumes different 
values for different changes of the parameters \X, . 

In the particular case when all the parameters A, remain 
constant (dA, = 0) we have 


_ ôE (@E/a8) dss OE Plo ss 
Car” die ae oe OB. 


Let us consider the special case of an ideal monatomic gpa 
enclosed in a vessel, subject to no external forces except the 
reactions of the vessel’s walls. In this case the only external 
parameter will be the volume V of the gas. Since B = 
(3n)/(28"), the heat capacity for constant volume is given by 
(86) r= Sin. 

Thus it is independent not only on the volume and temperature 
but also on the nature of the gas. 
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The heat capacity of the gas calculated under the assumption 
of constant pressure also plays an important role in physics. 


i p = nkT/V = const. 
i.e. 


z2 
(87) dT = nk tV- 


In order to calculate this heat capacity, we notice that the 
entropy of the ideal gas is given according to (82) by the 
expression: 


S = k[Eð + log æ0)] = kn log V + Ž kn log T + const. 


Since, according to the second law of thermodynamics, 
ôQ/T = dS, 


we conclude that for any changes of V and T the heat capacity 
is given by: 

C =T" M yata" 
(in the particular case dV = 0 this reduces to the expression 
(86)). In the case of constant pressure dV and dT are related 
by (87) giving us the result: C, = (5/2)kn (remembering that 
nkT/V = p). 

Thus the quantity C,/Cy = 5/3 represents a “universal 
constant” which remains the same for any amount of any mon- 
atomic gas under any physical conditions. This statement is 
generally in a good agreement with experiments, and the ob- 
served deviations can always be satisfactorily explained by the 
fact that the actual gases are never ideal. 


CHAPTER VIII 


DISPERSION AND THE DISTRIBUTIONS 
OF SUM FUNCTIONS 


35. The intermolecular correlation. Let us consider an 
isolated system consisting of a large number n of molecules, and 
let us select any two of these molecules. Let (P) be m phase 
function of our system depending only on the dynamic co- 
ordinates of the first molecule, and ¥(P) a phase function de- 
pending only on the dynamic coordinates of the second mole- 
cule. The functions ¢(P) and ¥(P), considered as random 
quantities, are not statistically independent of ane another 
since their dynamic coordinates arc related by the condition 
that the total energy E of the system remains constant. Be- 
cause of the large number of individual molecules composing 
the system one should expect, of course, that the dependence 
between the quantities (P) and ¥(P) is very weak. In partic- 
ular, one can expect that in any calculation the coefficient of 
correlation between these two quantities is infinitesimally 
small. As we shall soon see this is actually true. However, in 
many problems (in particular in the calculation of the disper- 
sion of sum functions) one must calculate suma of a very large 
number of such correlation coefficients; these sums often may 
apparently diverge, and thus cannot be neglected.’ For this 
reason it is necessary to have at least an approximate ex- 
pression for the intermolecular correlation coefficient. This 
question will be discussed in the present section. 

It is clear that in order to obtain the asymptotic formula we 
must impose certain limitations on the structure functions 
w(x) and w(x) of the two molecules selected as well as on the 


1This is particularly ao in the nase when, not being satisfied by estimates 
of the order of magnitude of the increase of dispersion, we want to obtain 
its asymptotic expression. This occurs, for example, in the comparison 
between theoretical and experimental results concerning fluctuations. 
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functions (P) and ¥(P). The point is, that we are looking for 
asymptotic formulae under the assumption that the number 
of molecules n as well as the total energy E of the system 
becomes infinitely large; it is clear without any calculation that 
an unduly rapid increase of the four above mentioned functions 
would disrupt their general asymptotic relations. 

For our purposes it is quite sufficient to assume that each 
of these four functions increases less rapidly than Cz’, where 
x is the energy (of the first or second molecule) corresponding 
to a given argument, and C and r are positive constants. This 
condition is actually satisfied in practically all problems of 
statistical mechanics. 

We will use the following notations: 


Q(z) = structure function of the system, 
@(a) = generating function of the system, 
U(z) = conjugate distribution of the system, 


— log « za) -pa (2 eee) 


We will use letters with the indices 1, 2, and 12 to denote 
quantities for the system without the first molecule, without 
the second molecule, and without both of them respectively. 
We will further denote by e, , ¢,(a), #:(z), Yı . dv, the energy, 
the generating function, the structure function, the phase space, 
and the volume element in it. The same letters with the index 
2 will denote the same quantities for the second molecule, 
whereas the index 12 will refer to the combination of the two 
molecules. Obviously: 


6 = 6 +e, Pia) = pı(a)p:(a), dv,. = dv, dvz 


and the space 7:2 is the direct product of y, and ys . 
The correlation coefficient connecting the functions y and 
y is determined by the expression 


Ree, V) = Sea, 


where D is the symbol of dispersion. For the denominator of 
the above expression we have (according to (26) chapter IV) 
the following basic expression: 


e- AY- y 
= f P) - Ae) - AEA an 
Taie ý Q(E) — 


Expressing the structure function in terms of the conjugate 
distribution functions (by means of (34) chapter IV), and 
putting a = ð we have: 


e7 Phesten) 


Re, V) = Depp | PP -AP - Woy 


(88) 
-L(e, , €2) diz 


where, for brevity 


(12) 
Lle, , e) = oe. 
As usual, our aim is to secure an asymptotic expression for 
R(¢, y) under the assumption n - œ. 

We begin with same general considerations. First of all we 
can eee from the approximate formula (39) (chapter V) that 
the quantity L(e, , e2) is bounded uniformly for any values of 
e, and e, when n > œ; in fact, the quantity U(E) is asymptotic- 
ally equal to (2xB)~’” (i.e. is an infinitely small quantity of 
the order n™'””), whereas, because of the same formula, the 
order of magnitude of U“” (EF — e, — ez) is in any esse not 
lower than n~”. 

We will also notice that for any arbitrary constants a, b, 
c, d, and for the constant values k > 0, | > 0, we have for 


no; 
I= f ae) + bia + d'e do = O(n"), 
(89) 
|... Ve) + Bo + d'e doy = 00°, 
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where g is any real number. Furthermore, because of the above 
assumptions concerning the properties of (P) and w,(z), we 
have, for example, for the first integral: 


| I. | < C et? n dv, = Cc at te wla) dz 


o1>lee'n log’a 


< C, J rtt te? dz < C. I e (es dz ‘ce O(n-’) 
og*n og*n 
where C and C, are positive constants. 

Turning now to the derivation of the asymptotic formula 
for the quantity R(¢y, Y), we use the formula (99) (see appendix) 
for an estimate of the quantities U""(E — e, — e) and 
U(E). Also, using the arguments given at the end of the 
appendix we find that for 


le, —& | < log?n, and |e, —&| < log’n: 
1 w 
U“™(E — e, — e) = [ap apa P |- z] 


Sette, o(itlel) 


Be n 
where we put for brevity: 
€ + 6&3 — et, — é: = W. 
On the other hand we have: 


U(E) = on + SB? + O(n). 


Thus, in the part yi, of the space yı, which is characterized 
by the inequalities e, < log’ n and e, < log’ n, we have 


L(e, , e2) 


1/2 g 1a Sa + Ta 
- | (45) ew [- 3] + ene Se 


i o(ttiel)|] J + (ase + O00) 
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= (5) a z) + (2n)'8,B-* + (w) T, Bw 


B 
3 
+ ote l] JU + SB + 007”). 
Noticing that 
1/2 
(B) = 1+ BE + o0, 
and denoting for brevity 
1 + (2r)'?S,B? = A 
we find 
Lle i) es) 
btb 3 w? (2 + |w ) 
A+ -2B 2 + (2r) "T.B w -— 3p t Ltl 
A + 0m’ 
(90) 2 3 
= es by = bth, (2m) T, Bow — —). 


The right hand side of this equality has the form: 
— ee; 3 
_ (e; a ĉa) +R+ o(2 +e P 
where R is formed by the terms of the type 
K(e; Tr AN 
(K is constant, i = 1 or 2, a = 0, 1 or 2). But: 


[ le — Mv — We?" Ke, — E)" duis 


=K J @- ae —aier dn | (vy De dey 
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Aceordimg to formula (48) chapter V, the second integral 
on the right hand side is an imfinitesimally small quantity of 
order not lower than n*. The first integral is of the same 
order of magnitude for s = 0, and remains bounded from 
above for s = 1 or 2. Therefore the terms with « = 0 (i.e. 
1 + (b, + 0b,)/2B) in the expression for R, when integrated 
over the entire space 71. , give a quantity of the order of 
meguitude O(n’), im the estimate of the integral (88). All 
other terms (i.e. those with 8 = 1 and s = 2) contain a constant 
(with respect to the variables of integration) factor of the 
order n`, giving after integration quantities of the order 
On“). Thus: 


fo = Ay- eR dors = OM 
Simce, on the other hand, because of (89): 
fo © = BO Det rR doa = ON") 


where yis = 712 ~~ 13 P is the part of the pace Yiz determined 
by the condition 


max (2, , 62) > log’ n 
we conclude that: 


f (© = YH PR don = 00. 


Denoting by Q the remaining terms in the formula (90), 
amd by C and C, we have: 


| J ~~ Yer doa 


< cn f lo—elly— Pld + | were” dos 


- Ca”. 
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Comparing the estimates so obtained wa have: 


-~ Fy 8 (exten) UE — 6 — b) 
J. (p — o(¥ — ve TE) dvi⸗ 


== f, o- DU- D EREE pre do 


+ on™”). 


Since, as we have seen above, the ratio 


UE -6 — ea) 
U(E) 


remains uniformly bounded in the entire space yı, for n >œ, 
the estimate given by (89) permits us to integrate both parts 
of the above relation over the entire space y:, . Therefore, 
using (88), we obtain: 





-Lð -Dea 
Pe V=], DAT 9) 


(¥ — Dle — &) n -3 
* (Dy)'” ¢2(8) dv, + O(n N 


Using the formula (48) chapter V 
e?“ — 
ae, S -1 
J pê (0) 1 ge + O(n ) 
we can give similar estimates for other integrals, in which 
ge, is replaced by 9, e, , Ye , Y, €2 . This gives 


Rie, ¥) = — E S Ve) + or) 


(91) 
1/2 
— on Re, e:)R(¥, ea) + O(n”), 


Both correlation coefficients entering into this expression 
connect two phase functions of the same molecule, and ean 
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be easily calculated if these functions ara known. The basic 
value of the asymptotic relation which is obtained in this way 
is the fact that the correlation coefficient pertaining to the 
phase functions of two different molecules is, asymptotically, 
inversely proportional to B (being an infinitesimally small 
quantity of the order n`*) for ever increasing n. 

Let us now consider the expression (91) under various pos- 
sible assumptions. 

1. In the sase when the two selected molecules have the same 
structure, and the functions » and y are the same, we obtain 
(putting b, = b: = b): 


Re, p) = — ZR (2 +00”), 


where e stands for the energy of one of the selected molecules. 
2. If » = y, and all the molecules have the same structure, 
we have B = nb and: 


Re, 9) = — 2 Re, ò + On"). 
3. If p = e, andy = e, , the formula (91) gives us: 


R(e, ,¢) = — (aba) + O(n-*””). 


The negative sign of the coefficient f in this last case could 
be foreseen; in fact, since the stochastic relation between the 
energies of two molecules is determined entirely by the condi- 
tion that the total energy of all molecules is æ constant, the 
decrease of the energy of any one molecule favors stochastically 
the increase of the energy of any other molecule and vice versa. 

4. Finally if all the molecules have the game structure and 
ọ = e, and y = e, , we have: 


Ree) = — È + On”), 
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This result is trivial since in this aase the formula: 


0 


DE = D 3 ê; = > De, + 2 (De, De)" R (e; » €x) 
k=} kal i 


nb + n(n — 1)bR(e; , ex) (i = k) 


leads to the ezact relation: 


1 
— 1 





Re:e) = -> (i = k). 


36. Dispersion and distribution laws of the sum functions. 
As wo have seen in section 13, chapter III, the estimates of 
dispersion of sum functions play an important role in the 
foundation of our entire theory. It is, in fact, the smallness 
of the mean square deviations of these functions which permits 
us to state that they assume values which are very close to 
their mesu values. 

In the terminology of the theory of probabilities this is 
equivalent to the statement that, for an infinitely large number 
of molecules, the sum functions are subject to the law of the 
large numbers. In view of the discussions of the previous 
paragraph, obtaining estimates of dispersion of the sum func- 
tion does not present any essential difficulties. 

Suppose we have a sum phase function: 


NP) = È FAP) 


where each term is a function of the dynamical coordinates of 
one molecule only. The mean value of the function f is: 


In the general case, when for increasing n the quantities f, 
remain between two limits of the same sign, the above given 
mean value is an infinitely large quantity of the order n. 

In the most common case, when all the molecules as well 
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as all f,-functions, are identical, the quantity f is directly 
proportional to n. 

Let us now estimate the dispersion of the function f. We 
have 


Di- -P -l Bu - aY] 


2 {fi = fd} + 2> (e — FIFe — Fd} 


È Df + D (Di DIDRG , fa). 


Assuming the functions f; are subject to the limitations intro- 
duced in the previous section, we obtain from the formula (91) 


Df = > Df; — 5 2 (b;ba Df; Df)” R(e; s J)Rler , fa) 


(92) 
+ O(n") 


since the number of the terms in the second sum on the right 
hand side is of the order of n? (in the above expression we 
denoted by e; the energy of that molecule related to the func- 
tion f; , and by b; the dispersion of this energy). The above 
relation indicates that, under the assumed conditions: 


Df = O(n) 


meaning that the mean square deviation of the function f is 
of order not greater than n" (i.e. considerably lower than the 
mean value of that function). This fact establishes the ‘“‘repre- 
sentability” of the mean values of the sum functions, and 
permits us to identify them with the time averages which 
represent the direct results of any physical measurement. 

Let us consider, as an example, the energy of a large com- 
ponent of some given system, and let us assume that this 
component contains the molecules with numbers from 1 to 
n(<n). We have fi =e, (1 StS m),f; = 0% > n), 
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f = D1. e, , and consequently Df: = b; (i < m) Df, = 0 
(i > n), R(e: , f:) = 1. Formula (92) now yields 


pp= Xo. -} E bib t 0n. 
ial B t, kong 
imkb 

Putting 72, b; = B', wa oan rewrite the above in the form 

pg = BY — 2 (a? — È a) + 0m") 

mk 

or, noticing that 
> o = 001) 
f=) 


w= 


in the form 


pf = EBE) + om^). 


The main term in this expression is just the dispersion for the 
Gaussian distribution, representing, approximately, the energy 
distribution of the large component (comp. Chapter V, section 
22). 

Let us also notice that in the most common asse when all 
the molecules and all the functions f; are identical (f,(P) = 
¥V(P), 1 < i < n) the formula (92) gives us: 


Df = n Dy — È Dy Ry, ein? — n) + 0") 


(93) 
=n Dy [1 — Ry, 2] + O(n”), 


where e stands for the energy of a single molecule. 

Let us turn now to a question concerning the distributions 
of sum functions. As usual, we will consider this as a limit 
problem, studying the form of the laws for n => œ. 

We can expect without detailed calculation that we will 
encounter here the same Gaussian distribution as in the partic- 
ular case of the energy distribution of a large component of 
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a given system (comp. section 22, chapter V). In fact, since 
any eum function represents the sum of an infinitely large 
number of random quantities, the interrelation of these quanti- 
ties is determined by the condition that the sum of the energies 
of individual molecules is equal to the total energy E of the 
system. 

With an ever increasing number of molecules, the correlation 
between the dynamical coordinates of any two of them be- 
eomes very weak; we have seen, in fact, that the correlation 
coefficient of two molecules tends to sero when n —>œ. Hence 
using a well known theorem of the theory of probability one 
can expect that the distribution of the sum functions for a 
large number of molecules will be, as a rule, similar to the 
Gaussian distribution. 

Let us consider the sum function F = > 3-1 f:(g:) where 
q; represents the set of the dynamic coordinates of that mole- 
cule associated with the function f; . Let v. (æ, y) be the volume 
of that part of the phase space for this molecule, where e; < z, 
Si(qs) < y. Also put 


d’v,(z, y) 


w,(z, y) = ðr oy s 


We will denote by V (z, y) and Q(z, y) the functions which are 
determined in similar manner for the entire system (the func- 
tions f; will, of course, be replaced by F). If we divide this 
system into two components (characterized by the indices 1 
and 2) one can easily see that 


a a er 


P<b Fa<b—-Fy 


= | Vie — E , b — F) av 


— J| Vi(a — z, b — y)%(z, y) dz dy 
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where the double integral is extended over the entire surface 


(z y). 
This gives us 


04) aa, b) = 20 = ff oe, yaa — z, b — y) de dy. 


This formula, which is analogous to the law of composition 
for structure functions (20), can be easily extended to an 
arbitrary number of components. 

Let us now try to express the distribution law of F in terms 
of the function Q(z, y). It oan be done most simply by re- 
ferring to the original geometrical meaning of the probability. 
We have previously defined the probability of the relation 


(95) b<F<b+ Ab. 


We are now interested in this aa the area of that part of the 
energy surface where this relation is fulfilled approaches sero. 
For this purpose we have selected the particular surface 
metrics for which the measure of any region of the surface 
Z, is equal to the limit (for Aa — 0) of a ratio in which the 
numerator is the volume measure of the layera < E < a + Aa 
located above the given surface, and denominator is equal to 
Aa. In particular, the measure Q(a) of the entire energy surface 
is equal to V’(a). 

It follows that the probability of the relation (95) oan be 
determined geometrically aa the limit (for Aa — 0) of the 
ratio of the volume of phase space for which a < E <a + 
Aa and b < F < b + Ab, to the volume of the layer a < E < 
a+ Aa. 

But the first of these two volumes is given by 


V(a + Aa, b + Ab) — V(a+ Aa, b) — V(a, b + Ab) + V(a, b), 
whereas the second is 


V(a + Aa) — V(a). 
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Dividing the numerator and denominator of this ratio by 
Aa, we find that the probability in question is 


lim {[V(a + Aa, b + Ab) — V(a + da, b) — Via, b + Ab) 


+ V(a, b)]/Aa}/{(V(a + Aa) — V(a)]/Aa}. 


The probability density p(a, b) is now obtained by dividing 
the above expression by Ab and putting Ab — 0. This yields 


aV (a, b) 
_ _ôaðb_ _ Qa, b) 
pla, b) = dV(a) Qa) ` 
da 


Starting from this basic formula, we will now obtain from it 
asymptotic formulae which are more convenient in further 
calculations. We will do this by using the composition law (94) 
for the function Q(z, y) in the same manner ns for the structure 
functions in chapter V. 

Let us assume that our system consists of a large number n 
of molecules, and let us put, as usual 


p0) = | odds Lisa, 


30) = IT o0) Š J Qa)” dz 


where ð is determined by the relation (d log ®)/dé + E = 0. 
Let us also denote 


alpe = u,(z, y) (1 <i<n), 
(96) a 
Uz — 


$W) = U(z, y). 
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Since 


[ oz, 2) dy = 02) ASisn, 
(97) 
f Az, y) dy = Az), " 


it follows that the functions u,(z, y) and U (z, y) represent the 
probability densities of same two-dimensional distribution (the 
analogue of the conjugate distributions). 

Generalizing the composition law (94) for n-components we 
obtain 


Q(a, b) 
= f { ist w(x, , ys) dz: —* — 8 a,b — 5 v). 


i~i i=l 
Expressing the functions R and w; through the functions U 
and u; , we can easily find from the formulae (96) that: 


U(a, b) 
— TI ute s Ys) dz; ay bu (a a Da 5 = yw) 


This relation shows that U(z, y) is the probability density 
of the two-dimensional distribution for the sum of n mutually 
independent terms distributed according with the densities 
u(x, y) (1 < i < n). Therefore, making certain assumptions 
concerning the limiting behavior of the functions w, and f, , 
we can use the two-dimensional central limit theorem.” 


*Thia last relation ean be obtained, for example, by differentiating tha 
relation 


Va) = Vie, +) — Vie, 0) = [VED yy, 


*Although the proof of thia theorem for this particular case has never 
been published, we do not think it expedient to burden the present exposi- 
tion by its detailed presentation. Although this proof is very long, it does 
not present any fundamental difficulties, amd the reader can develop it 
along the linas given in the appendix. 


For large n, 


U(a, b) = = exp {- zo [B.(a — A)? + A.(b — B,)’ 


— 2C,(a — A,)(b — Buy) 


where 


A= Fa, a = ff aude, i) dz dy (<sit<¢n), 


i=l 


B, = Ests yu:(z, u) drdy (1<i<ni, 


s=1 


A, = > f (x — a,)*u(z, y) dz dy, 


iel 


B, = Ý ff- bulz, 1) da dy, 


Ca = È ff — adu — bdude, 1) dz dy, 
= A,B, — C3 S 
Because of the relations (96), (97): 


A, = > r0 i qw,(x, yje’ dz dy 


= 5 a | (z)e"°* dz 


2 Qi 
_ Ag _ _ d log 88) _ 
= D eid) ed) d TF 


To avoid any misunderstanding ona should note that the notations 
used in this derivation do not correspond to those which wa used in the 
one-dimensional case. Previously we used the letter A to symbolize mo- 
ments of the first order and the letters B and C (with different indices) to 
symbolize the moments of the second order; now we use these indices to 
indicate the order of the moments. 


164 
therefore, for a = E, 


U(E, b) = z — |- diys Bo’ | 


and, because of (96), 


a, b) = Ber exp | - As (b — B|. 


Remembering that according to (42) 


re B(S)e*? 
Q(E) i. (wA) 


we finally find 
p(E, b) * aay = — 7 exp | - As (b = By |. 


In other words the limiting form is nothing but a Gaussian 
distribution with the center at B, and with the dispersion: 
a Ch 
— B, A,’ 


It remains to clarify the meaning of these parameters, and to 
see whether their values coincide with those which we obtained 
before. For this purpose we will show, first of all, that u,(z, y) 
is the density of the two-dimensional probability distribution 
which governs (approximately) the pair of random quantities 
e; , f: . In fact, since the set of dynamic coordinates of the 
selected molecule is distributed in the phase space y; of this 
molecule according to the law characterized (approximately) 
by the density (e°’*‘)/(y,(8)), the probability that the in- 
equalities z < e; < z + dz andy < f: < y + dy will be 
satisfied simultaneously is equal to the integral of the above 
expression, extended over that part of the phase space y; 
where the above inequalities are satisfied. Since, however, the 
integrated function can be considered as a constant (equal 
to e °*/y,(9)) in that part of the space, and since the volume 
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of this region is given by [0°V,(z, y)]/[ðz dy] dz dy = w,(z, y) 
dx dy, this probability becomes (e~°*/y,(8))w.(z, y)dz dy = 
ran y) dz dy. This proves our statement. From this it follows 
that 


— 


w 
' 
M 
= 
| 


X (De, Df.)""Ree; , fd. 


tel 


Ao = » De, , B, = > Df;, C: 
f=1 


f=1 


Therefore, for the dispersion associated with the limiting form 
we get 


2 = n 

(98) B: — C2 = >> Df; - 14 > (De; Df )'?R,; , De 
a t-1 a ¢=1 

This expression coincides with the formula (92), (within the 

limits specified in its derivation). The expression (98) also leads 

to the formula (93) if one assumes that all the functions f; 

are identical, and all molecules have the same structure. 

Let ua also note that in the case when the functions f; are 
not correlated with the energies e; of the corresponding mole- 
cules (i.e. when all R(e; , f;) = 0) the expression (98) leads to 
the value >>7., Df; for the dispersion of the function F. 
This fact could be foreseen before; in fact, we have already seen 
that the stochastic dependence between the dynamic coordi- 
nates of different molecules is due entirely to the condition 
$7., e; = E. It is clear, therefore, that the functions of the 
dynamic coordinates of individual molecules not being corre- 
lated with the energies and therefore not being subject to the 
above conditions, must behave mi non-correlated random 
quantities. Thus the dispersion of their sum must be equal to 
the sum of their dispersions. 


APPENDIX 


A proof of the central limit theorem of the theory of prob- 
ability. We find it necessary here to give a complete proof 
of the central limit theorem of the theory of probabilities, be- 
causo that form of this proof which is most convenient for 
the purposes of statistical mechanics is somewhat different 
from the form usually encountered in mathematical texts. The 
point is, that in mathematics one naturally tends to formulate 
theorems in the most general way, sacrificing, thereby, con- 
siderations of the accuracy of the given estimates; in the case 
of the central limit theorem one tries to give a proof which 
would hold for the broadest possible class of initial distribu- 
tion functions, without giving much attention to the smallness 
of the higher order terms. On the other hand in the case of the 
statistical mechanics we can limit ourselves to comparatively 
“smooth” distributions, paying more attention to a detailed 
estimate of the secondary terms. 

This difference in the points of view results in a somewhat 
different treatment of the details of the proof, and prevents us 
from simply referring the reader to the standard mathematical 
texts. It may be noticed, however, that the general idea and 
the analytical method used in the proof remain essentially un- 
changed, so that the competent reader could actually do this 
himself. 

The central limit theorem. Suppose wa have a sequence of 
mutually independent random quantities which are governed 
by distribution functions with the probability densities u,(z) 
(k = 1, 2, ---), and let 


g(t) = J dz (k = 1, 2, --) 


represent the characteristic functions corresponding to these dis- 
tribution laws. Let ua assume that: 
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1. The functions u,(z) possess continuous derivatives, and 
there exists such a constant A that 


[iu@[ds< A &=1,2,--). 


2. The functions u,(x) possess finite moments of the first five 
orders which we will denote by a, , by , Ca , Gy , ex - Without re- 
stricting the generality of our proof we can put a, = O (k = 1, 
2, -:+); then there exist positive constants a and B such that: 

0O<a<bh < 8,4, <B, d. <P, ẽ. 6 (k = 1,2, ---) 


where C and ê represent the absolute moments of the third and 
fifth order of the functions u,(z). 
3. There exist positive constants a and b such that for | t| < a: 


lga] >b &=1,2,---). 


4. For each interval (cı , c3) (C,C, > 0) there exists a number 
p(c, , C2) < 1 such that for any t within the interval (c, , c2) we 
have: 


lna@®|<ep (k=1,2, +). 


Let U,(x) be the probability density of the eum of the first n 
terms in the given sequence of random quantities. Then, for 
n—o and |x| < 2 log’ n, we have 


y Sat Tat o(4 + z A, 


B” 


(99) 


For any arbitrary z wa have: 


U,(z) = TBJ” exp |- Z| = o(2) 


where B, = > 3-; ba , and S, and T, ars quantities independent 
of z, increasing not faster than n. 


168 


The proof. We start with the Maclaurin formula 


a e 
ai SL > — fa, — Ea — a g(t), | %| <1 


the validity of which is due in this case to the fact that the 
existence of an absolute moment of a certain order implies the 
existence of the corresponding derivative of the characteristic 
function. Because of the postulate 2: 


[IPOD] < f lz lui) dr =a <p Ges 1,9, 34), 


Denoting by y(i) = log g(t) that branch of the logarithmic 
function which passes through zero for ¢ = 0, we can easily 
prove that for i — 0: 


2 43 4 
ao) nd = -Eh — Fa E (G vs) + vey 


which holds uniformly with respect to k. Let us put: 


> bh. = B,, Da =C,, È (È — ot) = D. 
k=l kal k=1 

and substitute in the previous formula ¢ = u/(B,)’”. We will 

always assume in future that u = 0 (log n). Taking the sum 

of (100) for k from 1 to n, we find: 


a u u' u? 
2 n( gira) — — 2 = os Cn + 5 D. +0 (t wa), 
which can be reduced to 


—u?/2 1, Ca (] 
I | a(n a) = fi- 6B?” u? + 3 u‘ ~ 725° u 


s (LELE) 
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We rewrite this relation as: 
I] nla) = erah + iK,B w + L„Bz’u* 


5 8 
— MiB;'u' + (LHL) 
where (as always in the future) each capital letter with the 
index n denotes the sum of n real numbers, independent of u, 
which form bounded sets for n >œ. 
As is well known from the general theory of characteristic 
functions 


U,(z) = oo f el TT a0} dt. 


We will divide the interval of integration into two parts, 
defined by |t| < log n/(B,)’” and |t| > log n/(B,)’”, and 
will write J, and J, for the values of the corresponding partial 
integrals. In order to estimate the value of I, we will use a 
shorter form of the Maclaurin formula 


g(t) = 1 — E ba + gi” (8) £, |%| <1. 
Since 
| g (%4) | < f | z u(x) dr = & < B 
we obtain 
P 
lal <|1-fbn] +E. 








Remembering that 1 — s < e`" for any real z, and assuming 
Ê < 2/B < 2/b, (k = 1, 2, ---), ie. that 1 — Ë b,/2 > 0 
(k = 1, 2, ---) we have: 


J 


odd |< em[-fn44 ier] (k= 1,2, ++). 
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Assuming further that | ¢| is so small the second term in the 
exponent is smaller than half of the first term, we have: 


| g(t) | < exp |- £ b|. 


Since the quantities b, are bounded from below (postulate 2), 
we conclude that, for sufficiently small ¢, the above inequality 
will hold for any k; thus, 


| I IRO) | < exp (-?'B,/4). 


Let this relation hold for | ¢| < ô; then, for sufficiently large n, 


log n 
( B.) 1/2 





< ô 


and 


B n = 
| f on II a(t} di | < | eT 8t dt 
loo m/(Ba)*/* hai log n/(Ba)*/® 


7 B” I eo" du = Oexp [—} log? n]) = O(n"). 


For an estimate of other parts of the integral let us notice that 
because of the postulate 1 


| ool) | = | f eula) de | 





(103) 








e"? +o 1 
E wo | -i e*"ul(a) dz 


since u(—©) = 0.” Because of the postulate 4 


A 
<Tep 


1The existence of the limits u,(+ ©) and u(— œ) follows from the in- 
tegrability of the function | u{(z) |. The integrability of the function u,(z) 
itaalf leads ua to the conclusion that in the limit the two above values must 
vanish. 
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there must exist a number p, 0 < p < 1 such that for b < 
t < 2A 


a| <pP<i1 (k = 1, 2, ---) 
ao that 


fen ache 


whereas, because of (103) we have 


| le ee 1 ao} — (4) dt = O(n"). 


Since similar estimates oun be obtained for corresponding parts 
of the region ¢ < 0, the relation (102) in conjunction with the 
last two relations gives us 


I, = 0(n7*) 


which holds uniformly with respect to z. 
Let us turn now to an asymptotic estimate of the integral 


I= + in I oto} dt 


\t\< (log n)/(Ba)*/# 


7 TBJ” Wee exp | - BI” ae j olay Sn) a 


Replacing the product under the integral by its asymptotic ex- 
pression (101), we obtain 


1 loon izu -3 à u’ 
L= 7B)? — — |- (B) Alf + iK.B." 


+ L,By7u* — MiB; 'u + (Lt, le lh du 





< PRA — 8) = On) 





2 
n?’ 


1 itu u? 
= aB)” Ja: aE |- B)” 2 ] gu 


172 





iK, log n izu u? ; 
+ 2B; l. a a | - (B,)'”? T 2 ] du 
Lo nS izu u’ 
+ B7 N or [- B)” X 
M? loo m ixu u? p 
-7 wB?” H z exp [> (B,)? = ar du + O(n oF 


We will denote the first four terms on the right hand side by 
A,, 4a , Az, Aq. Since e`" is known to be the characteristic 
function of the Gaussian distribution with center at zero and 
with dispersion 1, we have 


ee A 








and since 








1 f * E tux -E] du 
2x(B,)’ lul>log n P (Ba 2 


1 s 7 2 
wo obtain finally 
(104) A, = exp | - z] + O(n7’). 


For an estimate of the remaining three integrals we will intro- 
duce the notation 


fues” du = m, (r = 1,2, ---). 


Let us notice that, for sufficiently large u 


u < et 
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s0 that for sufficiently large u 


-ul - 
f we" du < e™* du 
Iwl>too «a {\ul>loo = 


(105) 
< exp [—} log’ n] = 0(n-”). 


We will also assume that | z| < 2 log’ n. 
For an estimate of A, we notice that 


J l-gar] 


tog u 3 
_; 3: Tu 25 u 
i i s — GB.) — | 2 | a 


. lo n 
By” Í. n we” du + O(z’n-**) . 


Using the estimate (105), and substituting into A, , wo find 


(106) A, = ZE 4 ott tel), 


Be? n 


In similar manner the estimate of 


too a 
f u'e ™ cos g yi7 du = m + (HE z Ez’) 














gives us 

mL, 1 +r’ 
ma” an 2B” + of ni’? ) 
We also get 

mM. +z 
(108) A, = wB” + o( 4 n5”? z, 


Collecting the estimates (104), (106), (107), (108), and re- 
membering that J, = 0(n~’), we find 
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1 2 
Usl2) = By” OP |- z] 
(109) : 


M 

mK,z + mL, + M lz |? 

+- 3/2 B, + (++ i= ) 
2rB, n 


which proves the first part of our theorem. To prove the 
second part of the theorem it is sufficient to notice that the 
integrals in A, , A; and A, remain bounded uniformly with 
respect to r(—œ < x < +) when n —>œ, and that the 
estimates of J, and A, are also uniform with respect to z. 

Remark. For many applications of the central limit theorem 
in the problems of statistical mechanics one often must use, 
together with the formula (109), a similar formula for U,,(z) 
where n’ is so close to n that the difference n’ — n remains 
bounded for n —> (we often have simply n’ =n — lorn = 
n — 2). In these cuses it is useful to remember that in writing 
the expression for U„ (x) it is not necessary to substitute n for 
n’ in all capital letters an the right hand side of (109). In 
particular, it suffices to substitute B,. for B, in the radical of 
the first term, leaving all other indices unchanged. Thus, for 
| z| < 2 log’ n we can write 


2 
exp |- z| mK,z + mL, + M M. 


= (21B...) 1/2 + OnB*/? 


sgen) 





In fact, a simple calculation, which we leave to the reader, 
shows that substituting n’ for all indices n, we change our 
expression (because the limitation of B, — B,, Ka — Kw, 
L, — Lr, M. — M,,-) only by an infinitesimally small quantity 
of an order of magnitude lower than that of the remaining 
term. Thus, we oan use or omit such a substitution at our 
convenience. 
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